Big data and why it will transform business

comments 4






Nikolai Myllymäki's picture

Data as a topic for discussion has emerged from the server rooms of the IT departments to the headlines of business journals, getting the prefix "big" on the way to supposedly distinguish it from the older, smaller and less useful data of yesterday. But is big data just a buzzword, or do the new set of technologies create genuine transformative forces in business? Mounting evidence from successful implementation and the rapid development of technology suggest that big data could live up to the hype, even though there are significant pitfalls on the way for both individual companies and the society in general. 

Do companies gain value from big data?

When thinking of data as a competitive advantage, the purest examples of succesful implementation are the internet behemoths whose businesses have been built on data from the very beginning: Google, Amazon and the social media companies Facebook and Linkedin. These companies have often been on the frontline in big data innovation, but to assess the relevance of big data to every business sector, one must also examine successes outside of the IT branch.

Some of the most promising results in big data use have been achieved in retail. Wal-Mart has been successfully outperforming competitors[i] since the late 1980's, when it started sharing its inventory data with its suppliers to streamline its supply chain. Tesco and Target have used customer loyalty programs to collect extensive datasets of their customers, and now use big data analysis tools to gain insights for marketing, pricing and product choices. Big data allows the segmentation of customers into ever-smaller micro-segments or even personalization of products, and the retail sector has advanced in big data usage due to its access to personal customer data from loyalty programs and customer accounts in online shopping.

Looking into other sectors, the delivery operator UPS has cut the distance driven by its delivery vans by 85 million daily miles using locational and other data to optimise routes[ii]. The cost savings in reduced fuel consumption are vast. Kaiser Permanente, an US healtchare provider, pooled its patient records and financial datasets and saved $1 bn in annual costs by using big data analytics on these data[iii]. Kaiser databases have also showed another type of using big data by helping to spot side-effects in a drug called Vioxx, resulting in the recall of said drug. 

So the phenomenon has already had an impact on some quite different areas of well-established businesses, and other companies are surely to follow suit in face of increased competition. So is big data going to make everyone healthier and richer? Possibly yes, but the road built on zeroes and ones is not one without holes in it.

A tool like any other

While it has an impressive track record, big data is not a cure-all for the problems of businesses or the society in general. It performs better in some situations and worse in some. Nassim Nicholas Taleb points out[iv] that differentiating between the noise and the signal in data is of utmost importance. He goes as far as to claim that adding data usually just adds noise. 

Even when I do not agree with this latter claim, it is true that when more data is generated and analyzed, one will unavoidably find more spurious correlations in addition to real causal relationships. Mindlessly jumping on every "customer sentiment change", which could be nothing but random variations, could help a company to lose focus and result in damage to image and profits. Information derived from data has to be analyzed and acted upon correctly, and according to McKinsey[v] the people with the necessary skills to do this could be hard to find in the coming years. 

Most probably we will see disastrous flops in data intrepretation in addition to the hoped-for great successes. One should never have blind faith in a model, however elegant, or forget the basics of the area or phenomena the model is used to examine. After all, the basic principles of a given field are usually the most tested and sound piece of data one can find.

Big data is well-placed in analyzing and improving ongoing operations, sometimes giving surprising insights into what is actually happening in the world. But it is by its nature backward-looking or at best measuring current events. It is possible to spot changes in macro trends early on, but it would be a mistake to think of this as foresight. Genuine thinking-outside-of-the-box innovation is incompatible with a data-driven mindset, and data is useless in predicting the so-called "black swan" events that are unforeseen but have vast impact when they happen. Thinking that data can help to predict the unpredictable is a recipe for disastrous overconfidence.

In the shadow of the data cloud

In addition to the possibility of stumbling in implementation, the exponentially expanding use of data also has some genuinely dark characteristics. Privacy concerns posed by the technology are huge, as an ever-growing mountain of individual data is collected, analyzed and used for profiling people. Individuals could soon face a world where they could not opt out from data collection even if they wanted. For example, hackers might steal large amounts of medical or locational data from servers, and with more data inevitably comes more possibilities for break-ins. Even shedding the use of all electronic devices could become an obsolete way of hiding from spying eyes, as CCTV cameras or the more futuristic small unmanned aerial vehicles (UAVs) could soon be cheap enough and effective enough to monitor large areas and use face recognition software to track the movements of individuals. While said technologies could have great utility in, say, the security surveillance of industrial sites, the possibilities for misuse by undemocratic governments or criminals are also vast.

International legislation regulating the use of data is woefully underdeveloped, and it is will be hard to keep it up-to-date considering the rapid development of technology and the difficulties of multi-country legal negotiations. Thus companies producing the new wave of products and services have more power and responsibility than they should have in defining the new rules of engagement of the digital world. The companies have to be very careful to avoid causing serious harm to the society (and their businesses in the form of the resulting publicity backlashes, of course).

Trends of today

Use of data as a business tool is not a new idea. Large companies have done it for decades. What differentiates big data from the ordinary kind is the scale and the novel ways it is used, for example in automatic image and natural language processing. Natural language processing is already used by companies for sentiment analysis of customers in social media, and automatic image recognition is used by, for example, Google's self driving cars. These technologies enable software to process unstructured data and integrate the obtained information with structured databases. Combined with machine learning, these technologies could become major drivers of new value for companies as technology allows formerly laborous work conducted by people to be replaced by machines and wholly new business areas to be developed. 

Companies can also use more and more external datasets in the future as government data is publicised, the social media giants create and sell more data, and information companies such as Palantir make connections between different databases and sell these services to third parties. Benefits offered by cloud services, such as near-instant scalability and low barriers for market entrance, mean that startups can now challenge the biggest players in their fields. The creative destruction of capitalism will allow for new and better business models to replace older ones.

Promises for tomorrow

Despite of all the other interesting new ways of doing business with big data, in my opinion the most important charasteristic of the phenomenon lies in its potential for huge scalability. As the different uses for data diverge and proliferate, the Economist[vi] points to the commodization of big data technology. A company vying for more data processing capabilities no longer has to build its own data centres from scratch, but can use more and more external service-based solutions and open-source software such as Hadoop, the de facto toolkit for big data analysis.
Extrapolating this trend to the future, combined with Moore's law, the exponentially growing stream of data and increasing sophistication of analysis tools, is where the really interesting possibilities lie. Even now a large chunk of data is stored and analysed with software of similar origins and shared between different entities[vii]. If different data sources and analysis platforms become increasingly compatible, co-dependent and interactive, a big data ecosystem is born. The leap from tailored IT services for each individual company towards an increasingly innovative, competitive, information-sharing ecosystem could transform business in the same way that smartphones have transformed the way people connect to each other. This is why big data is a big deal.




iii) The Economist, February 1st 2014, "Measuring health care: Need to know"

iv) Nassim Nicholas Taleb: "Antifragile: Things That Gain From Disorder" (Random House, 2012)


vi) The Economist, May 19th 2012, "Big data"

vii) IDC white paper: "Trends in Enterprise Hadoop Deployments", Ashish Nadkarni and Laura DuBois, 2013 [


Hi Nikolai,

Congratulations on the AIC trip!

I'd myself written on the same topic and find a lot of similarities between the two pieces. The only part where I tend to not agree with you is where you mention Big Data can't predict the so-called Black Swan events. In my research, I came across many evidences which showed that it was traditional data, with its dependence on the need for random sampling and hence the likelihood of missing very rare events, that was unable to predict such events. Whereas Big Data, which usually incorporates each and every event, was indeed capable of tracking down such rare events (such as banking fraud). Hope you can share your thoughts on that.

As regards the development of new tools for analyzing Big Data, do you see anything promising on the horizon which could handle large volumes as well as process information in real time?

Hi Prabhat, and thanks for the comment!
You are absolutely right about the power of Big Data in searching for rare occurrences amongst piles of observations or events. By combining different data sources people can make connections and predictions where there were previously none to be found. Spotting banking fraud is a good example of this, as would also be the now infamous way of Target spotting its customers' pregnancies.
In my opinion, however, these are not quite Black Swans. Nassim Nicholas Taleb gives three criteria for Black Swans: rarity, extreme impact, and retrospective (though not prospective) predictability. Insurance fraud or customer pregnancy are rare events, but they do not quite fit the other two criteria. While fraud can be damaging for an insurance company, single fraudulent claims are not big enough to up-end the firm. On predictability, usually insurance companies expect a certain amount of fraud to happen, so they use Big Data to search where these fraudulent claims emerge.
I would see that a Black Swan in insurance would be a bit different: a single, unpredictable event that results in huge amounts of claims the companies are not prepared for. For example an asteroid destroying London or all the crops in Italy freezing mid-summer. For events like these, data (big or small) does not help, because we have no observations of previous similar events. After all, Black Swans are unpredictable by definition.

As of the tools for big data analysis, I'm not a computer scientist and thus try to tread carefully when talking about software. From an economist's point of view, I think Intel's recently acquired stake in Cloudera is an interesting development, as it seems like Hadoop developers are consolidating. The platform has a lot of R&D money pouring in, so it is bound to become more effective as different distributions compete of customers. Of course, smaller startups are always something to watch out for; the next big thing could already be forming in a garage somewhere. You never know!

I think the article is good for a layman's overview of big data, but it doesn't flesh out the core distilling change that data brings, the key focus should be moving from a need to seek causality to a comfort with prediction more sophisticated models of social systems that a lot of the traditional social sciences have miserably with in the past century.

Your point about "Most probably we will see disastrous flops in data intrepretation in addition to the hoped-for great successes." is more relevant to the past. The economics profession is fraught with horrendous misuses of basic regressions, a lack of out-of-sample testing, a need to seek causality when there is none and a lack of basic understanding of Occam's Razor. The rise of the variety of features that are now available have pushed for the development of much more sophisticated prediction and classification algorithms which the economics community have failed to embrace.

Hi Henry, thanks for the thoughts!
The quality of scientific research is indeed something to be concerned about, not least in the economics principle. Data massaging in science is an endemic problem. It will be interesting to see how these new techniques and technologies affect the field. Hopefully the discipline will embrace change where it's due, and give more weight to scientific rigour.