MIT Sloan Management Review
Online auction site eBay uses data about the behavior of its millions of customers to drive analytics at every level of the organization, and get closer to its customers.
You can find just about anything on eBay: A vintage BMW, a Lear jet, a half-million-dollar yacht. Or perhaps a domain name, industrial equipment, software and services from the likes of IBM, a food safari in San Francisco. Or even a previously undiscovered species, such as Coelopleurus exquisitus, a heretofore-unknown sea urchin sold on eBay.
The e-commerce giant has localized operations in over 30 countries, with 100 million registered users. The latest number of sellers listed by eBay in 2009 is well in excess of 1.5 million (it’s hard to tell the exact number of sellers, given buyers are often sellers and vice versa).
From all that activity stems a lot of data and, eventually, information — which eBay is capitalizing upon through the use of data analytics research. The results: eBay is much closer to its tremendous customer base than ever before, and it is able to iterate faster on fulfilling customer requirements.
In a conversation with MIT Sloan Management Review contributing editor Renee Boucher Ferguson, Neel Sundaresan, senior director of research at eBay discusses how the company uses data and analytics at every level to continuously evolve eBay’s numerous sites and services for buyers and sellers.
Can you talk briefly about how eBay uses analytics?
Analytics at eBay is used at every level and scale. A/B tests are common in understanding user response to site or feature changes, and policy changes. These tests can get complex, as the site has many complementary and competing features and policies. So one has to be systematic in ensuring that the experiments are clean, and also in reading the results of the experiments and in attributing measures of success to the changes. Then, if the result reveals a positive or a negative response, the algorithm or systems designers can take that information and update the models or design better algorithms, features or systems and the policy makers can revisit the policies. Data from these experiments can come in various forms — user behavior data, transactional data, and customer service data.
What would you say are the biggest technical issues that you’re facing with data today?
Everybody talks about big data. The first aspect of this is building or implementing hardware and software systems that can handle data at a large scale, and can make them available and respond at speed and scale. The second aspect is tagging our software system to collect the right kind of data. The belief is that more data means more information. As researchers and data scientists we see that, while this is true — that adding more data brings in more features to consider and that it addresses some of the sparsity issue — it is also possible that more data can introduce more noise. Cutting the data the right way is key to good science.
One of the biggest tasks in this effort is data cleaning. For example, let’s say all these websites collect data, but the data is ridden with bots. They might be web crawlers from search engines like Google or Bing, or they might be some other agents that somebody has let loose to discover information from websites. But that makes it difficult to separate clean data from dirty data on our side.
A significant part of the skill required is the ability to look at data and cut it the right way.
What are the biggest management issues that you’re facing with data?
A funny thing about data is that the more you collect it, the more you want it. And you want it in shapes and forms that you did not think about before, because now it’s possible. So data is growing faster than ever before.
As you make analytics-driven decisions, you want to learn more. That means you have to tag more pages of the site, and you have to track more of the data, and you have to produce better reports. Suddenly you’re growing your data much faster than you thought you would be growing. The challenge of managing scale is primary here.
I believe, as a scientist, that no data should be thrown away, meaning you should always keep data around for better data science. Just keeping large amounts of data, managing them, is another challenge.
The third challenge is new kinds of data. For example, if you go back even six, seven years, most of the data was text data. Suddenly, with the massive adoption of smartphone devices, there’s a huge explosion of image and video data, location data, and other sensor data. Being able to deal with new kinds of data and understand these new kinds of data — understand what they mean — is a challenge. As personal devices get smarter, as we augment ourselves with more and more devices, be it the smartphone or watch or eye-wear, new kinds of data [are] going to be everywhere.
Analytics is starting to look quite different from what it looked like a while ago. Suddenly we are seeing new forms of data, and we need to be prepared to process this data really well and in a near-real-time manner.
What do you see as the biggest management opportunities connected with analytics, to change or enhance the way that you operate?
I think the biggest change you see is that everybody in the organization — whether they are a technical person, a researcher or an engineer, whether they’re a product manager, a businessperson, a usual contributor or a manager — everybody has to be data driven.
Now, not everybody has to look at data, but everybody has to understand data at some level. And that’s a skill that most people were not trained for at school, or even in their previous jobs. And suddenly everyone has to know some basic statistics. They have to have some basic understanding of data.
A lot of data is coming from the behavior of millions of users on our site. So, being able to understand and kind of get your head around that data and the analysis is really important. You can think of it as an attitude change in all grades of people.
How do organizations go about implementing those skills sets for non-data analysts? Is it training?
Yes, a lot of it is training, either through courses with hands-on work and some of it is just learning on the job. For example, if you’re a manager, you have to understand the graphs that your people produce, and you have to know what it means to say, “you know, it’s sort of statistically significant or within the noise.” You need to know what that means. Otherwise, you will make decisions that are not data-driven, which won’t be correct in this new world. The depth of the skills might be different for an engineer or a technical manager or a product manager. But everyone needs to understand and be able to make data-driven decisions at some level.
Are you using data analytics to compete in new ways or to compete more effectively than you did, say, a couple of years ago?
In areas where we didn’t even think we could use analytics, we can use it now, just because it’s available. And when it’s available, you find it useful. When it’s useful, you want more of it available. So, just pretty much everything seems to have turned into this data-driven space of doing things.
I’ll give you an example: Business decisions that were made through qualitative data analysis and surveys are now augmented or replaced by readings from real use of the product from large numbers of users. This provides both immediacy and scale to the analysis and decision.
eBay is unique in that we have a large amount of data, and we also have a large amount of people using the system. We have 100 million signed users, which makes things interesting.
Let me give you an example from a paper we wrote last year. Economists often ask questions about consumer response to policy changes. They study the response by conducting either lab experiments or field experiments. The former is in an artificial setting, and the latter is at a limited scale. Let’s say the question is, “How do people internalize shipping costs in online commerce?” One way to do field experiment is, they would buy 100 Pokémon cards or 100 DVDs, and then they would sell 50 with free shipping at some price and 50 at a different price with $5 shipping. And then they would collect all the data and slice and dice it, run regression on it, and draw their conclusions.
What we found, when looking at this data at scale, was that our sellers and our buyers are already running field experiments for us. So, instead of asking, “Is free shipping a good policy or not?” — our sellers, especially our power sellers, are already running these experiments at scale. These are naturally occurring field experiments at web-scale! So we can look at the data and answer these questions — even before we create a policy or run an experiment.
And that’s a huge, huge change in how we look at data and make decisions.
Many of our power sellers are very smart about their business, and they know what to do. When they run these experiments, they see what happens. And they probably are correcting themselves when something happens. But at the same time, the data tells us the story that we want to hear.
What are the implications of that understanding within your industry? And given your huge user base, are there wider implications?
I think it has wider implications because a lot of platforms, not just eBay, have users that are powerful, and there’s constantly experiments occurring on the side. So, as an online site, you don’t have to actually go and experiment. For example, in the advertising space, advertisers had to advertise on Google or Bing. They stop advertising, then they advertise again.
Now, the reason why they might stop advertising is because they run out of budget or because they want to study the effectiveness of their own advertising. So, now that not only tells advertisers about the effectiveness of advertising, but also informs the people that have access to the data — in this case Google or Bing — the effectiveness of the existence or non-existence of an advertiser in their Internet advertising space.
Suddenly you have new kinds of data that you could not imagine before. So, it’s only your cleverness in understanding the data or analyzing the data that can inform you better and faster. You understand your customers better.
Often you run surveys and the customers tell you something. But with surveys, you have the “squeaky wheel problem” — the ones who complain are the ones who complain a lot, while a lot of unhappy people may never say anything. They may just walk away from the site or suffer quietly, but at the same time are getting burned.
When you look at user behavior in a systematic way, you can do often see the discrepancy between what they say and what they do. Some behaviors that cannot be captured in surveys are better seen in data. This helps us understand the friction points when they use our system.
That’s how data brings us closer to our customer than ever before. When you use analytics, you can go back to the customer and understand them better. You can create tools or deploy a product, see how they use it, and correct course — you can iterate much faster.
Where do you think the greatest opportunity for impact is in the future utilizing data analytics and big data analytics?
I believe data is everywhere, beyond the commercial context I discuss. You see that in the health space, certainly — with access to a lot more data, you can work with simple algorithms and you can answer questions at scale, at speed, much more than you could before. Local weather to climate changes, allergies to pandemics, education, food, calamities, politics, medicine — every aspect our daily lives, our economy and livelihood starts to look different when we are driven by data. Data science is not just for organizations and scientists; it is for consumers and individuals as well. As business, social, political and personal decisions are driven by data and information, we can address problems in a more systematic and transparent way.
What are the challenges that lie ahead?
I think the challenges are that we are collecting a lot more data. There are questions about privacy and security and abuse that can happen with access to data in the hands of those who shouldn’t have access to it. While these are important issues, I will leave these issues out of this conversation. I’m looking only at the positive side and good use of data. The big challenges are, how quickly can you scale to be able to handle this data? And can the tools that come with the data scale at the same pace? While these are indeed challenges, we are lucky to live in the golden age of data; we should use it to benefit the good.