Tuesday, August 27, 2013

The Relevance of Data: Going Behind The Scenes at LinkedIn

MIT Sloan Management Review
August 26, 2013  
Deepak Agarwal (LinkedIn), interviewed by Renee Boucher Ferguson

It is an understatement to say LinkedIn is growing like a weed. It’s rare that any company — much less one started in its founder’s livingroom a little more than a decade ago — can boast the kind of growth LinkedIn has right now: Every second, 2 people (or more!) join the site. With 238 million members in over 200 countries, 2.8 million active company profiles, and 1 million professionally oriented groups, LinkedIn has become the world’s largest professional networking site. It’s an unrivaled achievement; literally none of its competitors (or imitators) has anywhere near that reach.
The company’s mission is deceptively simple: Connect the world’s professionals to make them more productive and successful. But its vision is something completely different. In a 2012 blog, Jeff Weiner, LinkedIn’s CEO (@jeffweiner), explained how the company initially developed an infrastructure that could map its members’ professional relationships up to three degrees.
That reach — and the massive amount of data that’s connected and parsed through LinkedIn’s complex machine learning and optimization algorithms — has interesting implications for members and employers. But the vision extends to future global economic development. Deepak Agarwal, director of relevance science at LinkedIn, talks with MIT Sloan Management Review contributing editor Renee Boucher Ferguson about LinkedIn’s relentless focus on data relevance. And what the Age of Analytics, coupled with LinkedIn data, could mean for the world.
Can you provide a bit of background on what relevance science means at LinkedIn?
Broadly speaking, the role of relevance science at LinkedIn is to improve the relevancy of products by extracting the signal from LinkedIn data. This is a difficult problem and requires an interdisciplinary approach. Our relevance scientists have diverse backgrounds ranging from computer science, machine learning, optimization, statistics, information retrieval, economics, and software engineering. We have made a significant impact on products like advertising, LinkedIn feed, news, job recommendation, people recommendation, and many others.
How does that work, in practical terms, to improve relevance at LinkedIn?
Since the inventory of items we can display to users (e.g., feed updates, ads, news, people, jobs and others) is selected from a very large and dynamic pool, it is infeasible to select the best items for every user visit manually. We have built sophisticated machine learning and optimization algorithms to automatically recommend the best “items” to users in a given context at scale. Such automation improves the relevancy of products at low marginal cost and hence contributes to the bottom line.
The algorithms we use are able to combine various data sources to perform such recommendations. We are fortunate to have rich profile data about our users; we know who they are connected to, and we understand past user interactions on various devices.
For users who visit very often, we are able to provide deeply personalized recommendations. The sporadic visitors are automatically grouped into homogeneous cohorts by algorithms, and we provide best recommendations for each such cohort. We adapt our recommendations in real time based on what our users have consumed in the past. The entire end-to-end machinery has to work together to improve the relevance of our products.
You mentioned extracting the signal from data. Can you explain what that means to you?
At the heart of extracting signal from data and providing great recommendations on different LinkedIn pages is the ability to predict user response to different items. For instance, how likely is the user to click on an ad by a big advertiser when shown on the profile page? Will the user share an article liked by another user who works for the same company? Will the user’s propensity to click on an ad on the LinkedIn homepage reduce when we recommend a friend’s job update on the feed? Such predictions are difficult to make due to the curse of dimensionality (we can slice and dice the data into a large number of segments at LinkedIn).
This is where sophisticated algorithms and software engineering to deploy such algorithms become germane. In addition, connecting metrics that machines can readily optimize to the overall business goals is also a challenging aspect we continuously work on at LinkedIn. At the end, extracting signal is important, but it is even more important to connect it to the overall business goals. This requires harmony and close collaboration among engineers, scientists and product innovators. That’s where the great work culture at LinkedIn comes into the picture.
What are some of interesting applications of analytics that you used today that might not have been possible a couple of years ago?
LinkedIn’s greatest asset is its data. We have been a successful company primarily because of the analytics we have built to extract the maximum value out of our data. Data analytics is like oxygen at LinkedIn; almost all our products are based on use of data. People you may know, endorsements, who viewed my profile, LinkedIn Today, LinkedIn Feed, jobs you may like — the list is endless. And last but not the least, our rich search capabilities for both users and recruiters that help in connecting talent to opportunity at scale.
But of all the products, I think LinkedIn’s recent foray into original content and content marketing is something that would not be possible few years ago. It has redefined LinkedIn as a platform where professionals don’t merely come to post their resume and find jobs, but to consume information that makes them better at what they do every single day.
This has only been possible by launch of products like Influencers that has enabled LinkedIn to become a platform to produce original and high quality professional content. The launch of sponsored updates has provided companies a medium to have meaningful communication with professionals. I am really looking forward to see how this story evolves over the next few years.
What are some of the technical issues that you are facing with data?
As with any other consumer-centric Web company, the two most important issues we face with data are scale and heterogeneity. Processing data at the scale of LinkedIn requires non-trivial infrastructure innovation. Heterogeneity in terms of information available is more of an algorithmic challenge.
At the end of the day, I think both algorithmic and infrastructure innovation should be done together. Some hard infrastructure challenges can be solved through clever algorithmic approximation, and similarly an almost intractable algorithmic problem can be tamed by good infrastructure for computation. LinkedIn is at the forefront of such innovation, but a lot more needs to be done.
What do you see as the biggest management opportunities connected with analytics?
I think the kind of data that LinkedIn has is very unique — data that gives you a CV of professionals, data that tells you how professionals are linked to each other, data that tells you about the educational background and career progression of LinkedIn users over time, and so on. I believe there is a huge opportunity for management to utilize this data and come up with insights and new products that would not be possible elsewhere. And I also believe LinkedIn has done a great job with it. But a lot more can be done and is being done.
How do you see that power of LinkedIn data evolving?
This is perhaps myself theorizing more than anything else. As we have more professionals from various disciplines and countries embracing LinkedIn, I see the potential to become the platform to provide tools that can democratize learning. If we succeed at using all this data to better connect talent with opportunity and make professionals better at their job, the entire human race could become more efficient, and we could even move the GDP of nations.
I was born and brought up in a middle-class Indian family, I know how difficult it is for a majority of folks in developing countries to have access to knowledge and resources that are essential to compete in the global marketplace. I believe LinkedIn can level this playing field. I know this sounds too farfetched, but I truly believe LinkedIn can one day help us achieve that, if we do things right. It is an ambitious goal and could take years, but I truly believe we will ultimately get there.
Where do you see the greatest opportunity for impact in using data analytics in the future?
We are living in a digital age. Many of the interactions that we do in our day-to-day lives today are being digitized. And given the sea of data we are all producing and recording and living in, information overload is going to be the biggest problem for the coming generations. That’s also the biggest opportunity I see for data.
If you think of the previous century as being the age of physics, the next stage is the age of data analytics. How do we take all the sea of data, which we are digitally recording every single day, and glean the most interesting aspects out of that data and use if in meaningful ways to make our lives better? That’s where the complexity lies.
What is relevant varies across different users. So, we have to figure out smart ways of figuring out what is most relevant for a given user in a given context and then surface it appropriately.
It is a daunting task and no one discipline can solve it. We have to take an interdisciplinary approach to solve one of the most pressing problems of our times — information overload. If done properly, it can change the human race in a fundamental way. I feel fortunate to be part of this digital revolution; there could not be a better time for an analyst like me to be alive.
ABOUT THE AUTHOR
Renee Boucher Ferguson is a researcher and editor at MIT Sloan Management Review

No comments:

Post a Comment