The Tricky Business of Big Data

comments 0






Ankit Buddhiraju's picture

‘Big data’ is quickly gaining currency as the new mantra of the digerati. Along with other buzzwords like ‘data mining’ and ‘machine learning,’ big data promises to help businesses not only identify the underlying patterns in their treasure troves of consumer and operations data, but leverage these patterns to increase sales, boost productivity, and gain a significant competitive advantage [1]. Companies such as and Inrix have burgeoned to meet the growing demand for big data analytics from individuals and corporations alike, spurred by success stories like Amazon’s product recommendation system and Google’s accuracy in tracking the real-time spread of the 2009 flu virus [2]. Big data has even been championed at the national level, with the White House spearheading a $200 million ‘Big Data Initiative’ in 2012.


Many regard big data as a formidable tool to supplement human intuition and experience, which have always directed the decision making process. But others believe that it is precisely this paradigm that big data now threatens to besiege. Anderson’s ‘The End of Theory’ asks that if big data analytics can generate actionable insights by reliably revealing what customers do, what use do we have for ‘experts’ to divine why they do it [3]? Mayer and Cukier’s book Big Data agrees, arguing that the data itself – and not the skills or mindset needed to understand it – will become the most sought-after commodity [2]. Here we examine the tools and trends in big data, ending with a discussion of which stages in the big data value chain are likely to reap the highest returns on investment.


What is big data?

A prevalent model for describing big data is Gartner’s ‘3 Vs’: volume, velocity, and variety [4]. With far too many petabytes of information to manipulate within a single database, businesses must employ novel software and hardware architectures like Hadoop (an open source framework developed at Yahoo!), and new paradigms like Google’s celebrated MapReduce for processing the data in more manageable chunks. These paradigms are then implemented with ‘massively parallel software running on tens, hundreds, or even thousands of servers’ [5]. Moreover, thanks to a panoply of modern technologies like pressure sensors, GPS, and gyroscopes now installed in everything from vehicles to factories and houses to phones, we can ‘datafy’ real-world phenomena – location, friendships, and personal health – at a faster rate than ever before.


The Evidence

Big data has unequivocally transformed the companies that have chosen to embrace it. It can serve as the building block for unique business models based on servicing previously unrealized consumer needs – Oren Eztioni’s company Farecast combs through millions of plane ticket records to recommend when customers should purchase their tickets for the lowest fares. Retailers like Walmart tap into a wealth of customer transaction data, seeing the potential to improve operating margins by 60 percent [1]. Others sell their expertise to the public sector: Experian helps the IRS cheaply estimate a person’s income from her credit history, while PriceStats, originally an MIT project, provides far most accurate real-time estimates of inflation than the problematic CPI [2]. The 2011 McKinsey report on big data cites several other ways in which big data creates value for businesses, including increased transparency, improved customer segmentation, and a more scientific approach to organizational leadership [6].


The case for wedding computations to corporations transcends the anecdotal – big data has been used as its own critic. A comprehensive study of 330 public North American firms revealed that ‘companies in the top third of their industry in the use of data-driven decision making were, on average, 5% more productive and 6% more profitable than their competitors,’ results that remained statistically significant even after ‘accounting for the contributions of labor, capital, purchased services, and traditional IT investment’ [7]. The caveat, of course, lies in the elusive phrase ‘data-driven decision making,’ which Jeanne Ross reckons few companies actively espouse, even for their presently ‘small’ data. Without a culture of management actively empowering its employees to make worthwhile choices based on relevant data, firms fail to profit from the information they already have, and ‘companies don’t magically develop those competencies just because they’ve invested in high-end analytics tools’ [8].


The first logical step, it would seem, is for businesses to cultivate an appreciation and application of evidence-based policymaking at all levels of the organization. Only if employees are acclimatized to the process of responsibly collecting, probing, and acting upon meaningful data (while maintaining privacy and intellectual property rights) can we confidently sow the seeds of big data analytics. By the time this mindset goes mainstream, many current objections to big data that have stymied its adoption – medium to high technology costs, incompatible standards, and privacy issues [9] – are likely to lose traction, since the infrastructure will become more familiar and less expensive. And if these incentives do not persuade everyone immediately, hopefully the mounting evidence for data-driven management’s competitive advantage will entice the most obstinate industries to jump on the big data bandwagon.


The Big Data Value Chain

The argument for big data may be cogent, but when push comes to shove, big data is a disruptive innovation demanding an altered worldview. Companies will want to know how to position themselves in the new world order, and how to generate the most returns on this newfangled asset. Treating data with the same reverence as labor and capital, Mayer discusses the ‘big data value chain,’ comprising three layers: data, skills, and mindset [2].


The first layer – data – refers to the data holders: those who collect the data or control access to it. The other layers refer to the ‘data scientists’ – those with the requisite ‘deep analytical training’ [6] – and the individuals who dream up big data applications. While many clamor about the shortage of an estimated 190,000 data-savvy professionals [6], Mayer downplays these concerns, stating, “as more people acquire the expertise … eventually most value will be in the data itself.”


While the deficit of skilled data scientists will gradually evaporate as the needs of business trickle down to academia, Mayer’s assertion that the data dominates the value chain seems slightly misplaced. His premise – those with access to the data will soon ‘wake up’ to its true value and cease to share it with third parties – does not necessarily imply that data scientists will become less valued. Rather, it makes a compelling case for businesses to start in-housing that kind of talent where possible, securing a larger share of the profits in the process. Some companies like MasterCard already do this, choosing to create subsidiaries (e.g. MasterCard Advisors) dedicated to big data analytics instead of license their records to independent contractors [2]. The data is a bargaining chip, a means of leverage to attract the best talent. It does not undermine the need for the skills or the mindset; it just affords businesses the luxury to be picky.


None of this matters if the ‘best’ analysts cannot squeeze significantly more juice out of their data than merely ‘good’ analysts, as Mayer assumes will happen as the ‘skills’ become more commonplace. Yet this merits a discussion of what these ‘skills’ actually are. Most focused on, of course, are the required statistics and programming expertise. These truly are diminishing in relative value (the latter especially) because of how easily they can be delegated and even outsourced. But the devil is in the detail. Dig deeper into the algorithms – with fancy names like ‘latent semantic analysis’ and ‘unsupervised cluster detection’ – and you find that they can very easily identify what variables are correlated, but only once we tell them what those interesting variables are to start with [10].


And that is precisely the point. Statistics can easily confirm that treating consumer fraud as a DNA sequencing problem vastly improves fraud detection [11] – but only because someone had the bright idea to make the connection between these two disparate fields. Statistics can easily confirm to earthquake physicist Didier Sornette that stock prices before a financial crash mimic tectonic plates before a seismic rupture [12] – but only once he intuited that they might be related at all. We will always need talented domain experts, not to do the brunt work of coding up statistical analyses, but to ask the right questions, collect the right data, communicate effectively to the upper echelons of management, and incorporate creative ways of thinking from other disciplines. Perhaps our optimism in assuming that our current data can reveal absolutely any trend is really complacency, a result of past experts’ decisions to collect that data in the first place.


The diffusion of analytical know-how only affects the nature of expertise, not its necessity. We must invest in big data by first investing in a data-minded outlook – namely, that data has the immense potential to provide, deride, and decide our notions of the world’s inner workings. Our prospects hinge on the ability of business to manage big data, and big data to manage business.


[1] McGuire, Tim, James Manyika, and Michael Chui. “Why Big Data is the New Competitive Advantage.” Ivey Business Journal (2012).
[2] Mayer-Schönberger, Viktor, and Kenneth Cukier. Big Data: A Revolution that Will Transform how We Live, Work, and Think. Eamon Dolan/Houghton Mifflin Harcourt, 2013.
[3] Anderson, Chris. "The end of theory." Wired Magazine 16 (2008).
[4] Genovese, Yvonne, and S. Prentice. "Pattern-Based Strategy: Getting Value From Big Data." Gartner Special Report (2011).
[5] Jacobs, Adam. "The pathologies of big data." Communications of the ACM 52, no. 8 (2009): 36-44.
[6] McAfee, Andrew, and Erik Brynjolfsson. "Big data: the management revolution." Harvard Business Review 90, no. 10 (2012): 60-66.
[7] Ross, Jeanne W., Cynthia M. Beath, and Anne Quaadgras. “You May Not Need Big Data After All.” Harvard Business Review 91, no. 12 (2013).
[8] Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H. Byers. "Big data: The next frontier for innovation, competition, and productivity." (2011).
[9] Jordan, John. “The Risks of Big Data for Companies.” The Wall Street Journal (2013).
[10] Hales, David. “Lies, Damn Lies and Big Data.” Synthesis (2013).
[11] Davenport, Thomas H., and D. J. Patil. "Data scientist: the sexiest job of the 21st century." Harvard Business Review 90, no. 10 (2012): 70-77.
[12] Weatherall, James Owen. The Physics of Wall Street: A Brief History of Predicting the Unpredictable. Houghton Mifflin Harcourt, 2013.