Data as a topic for discussion has emerged from the server rooms of the IT departments to the headlines of business journals, getting the prefix "big" on the way to supposedly distinguish it from the older, smaller and less useful data of yesterday. But is big data just a buzzword, or do the new set of technologies create genuine transformative forces in business? Mounting evidence from successful implementation and the rapid development of technology suggest that big data could live up to the hype, even though there are significant pitfalls on the way for both individual companies and the society in general.
Do companies gain value from big data?
When thinking of data as a competitive advantage, the purest examples of succesful implementation are the internet behemoths whose businesses have been built on data from the very beginning: Google, Amazon and the social media companies Facebook and Linkedin. These companies have often been on the frontline in big data innovation, but to assess the relevance of big data to every business sector, one must also examine successes outside of the IT branch.
Some of the most promising results in big data use have been achieved in retail. Wal-Mart has been successfully outperforming competitors[i] since the late 1980's, when it started sharing its inventory data with its suppliers to streamline its supply chain. Tesco and Target have used customer loyalty programs to collect extensive datasets of their customers, and now use big data analysis tools to gain insights for marketing, pricing and product choices. Big data allows the segmentation of customers into ever-smaller micro-segments or even personalization of products, and the retail sector has advanced in big data usage due to its access to personal customer data from loyalty programs and customer accounts in online shopping.
Looking into other sectors, the delivery operator UPS has cut the distance driven by its delivery vans by 85 million daily miles using locational and other data to optimise routes[ii]. The cost savings in reduced fuel consumption are vast. Kaiser Permanente, an US healtchare provider, pooled its patient records and financial datasets and saved $1 bn in annual costs by using big data analytics on these data[iii]. Kaiser databases have also showed another type of using big data by helping to spot side-effects in a drug called Vioxx, resulting in the recall of said drug.
So the phenomenon has already had an impact on some quite different areas of well-established businesses, and other companies are surely to follow suit in face of increased competition. So is big data going to make everyone healthier and richer? Possibly yes, but the road built on zeroes and ones is not one without holes in it.
A tool like any other
While it has an impressive track record, big data is not a cure-all for the problems of businesses or the society in general. It performs better in some situations and worse in some. Nassim Nicholas Taleb points out[iv] that differentiating between the noise and the signal in data is of utmost importance. He goes as far as to claim that adding data usually just adds noise.
Even when I do not agree with this latter claim, it is true that when more data is generated and analyzed, one will unavoidably find more spurious correlations in addition to real causal relationships. Mindlessly jumping on every "customer sentiment change", which could be nothing but random variations, could help a company to lose focus and result in damage to image and profits. Information derived from data has to be analyzed and acted upon correctly, and according to McKinsey[v] the people with the necessary skills to do this could be hard to find in the coming years.
Most probably we will see disastrous flops in data intrepretation in addition to the hoped-for great successes. One should never have blind faith in a model, however elegant, or forget the basics of the area or phenomena the model is used to examine. After all, the basic principles of a given field are usually the most tested and sound piece of data one can find.
Big data is well-placed in analyzing and improving ongoing operations, sometimes giving surprising insights into what is actually happening in the world. But it is by its nature backward-looking or at best measuring current events. It is possible to spot changes in macro trends early on, but it would be a mistake to think of this as foresight. Genuine thinking-outside-of-the-box innovation is incompatible with a data-driven mindset, and data is useless in predicting the so-called "black swan" events that are unforeseen but have vast impact when they happen. Thinking that data can help to predict the unpredictable is a recipe for disastrous overconfidence.
In the shadow of the data cloud
In addition to the possibility of stumbling in implementation, the exponentially expanding use of data also has some genuinely dark characteristics. Privacy concerns posed by the technology are huge, as an ever-growing mountain of individual data is collected, analyzed and used for profiling people. Individuals could soon face a world where they could not opt out from data collection even if they wanted. For example, hackers might steal large amounts of medical or locational data from servers, and with more data inevitably comes more possibilities for break-ins. Even shedding the use of all electronic devices could become an obsolete way of hiding from spying eyes, as CCTV cameras or the more futuristic small unmanned aerial vehicles (UAVs) could soon be cheap enough and effective enough to monitor large areas and use face recognition software to track the movements of individuals. While said technologies could have great utility in, say, the security surveillance of industrial sites, the possibilities for misuse by undemocratic governments or criminals are also vast.
International legislation regulating the use of data is woefully underdeveloped, and it is will be hard to keep it up-to-date considering the rapid development of technology and the difficulties of multi-country legal negotiations. Thus companies producing the new wave of products and services have more power and responsibility than they should have in defining the new rules of engagement of the digital world. The companies have to be very careful to avoid causing serious harm to the society (and their businesses in the form of the resulting publicity backlashes, of course).
Trends of today
Use of data as a business tool is not a new idea. Large companies have done it for decades. What differentiates big data from the ordinary kind is the scale and the novel ways it is used, for example in automatic image and natural language processing. Natural language processing is already used by companies for sentiment analysis of customers in social media, and automatic image recognition is used by, for example, Google's self driving cars. These technologies enable software to process unstructured data and integrate the obtained information with structured databases. Combined with machine learning, these technologies could become major drivers of new value for companies as technology allows formerly laborous work conducted by people to be replaced by machines and wholly new business areas to be developed.
Companies can also use more and more external datasets in the future as government data is publicised, the social media giants create and sell more data, and information companies such as Palantir make connections between different databases and sell these services to third parties. Benefits offered by cloud services, such as near-instant scalability and low barriers for market entrance, mean that startups can now challenge the biggest players in their fields. The creative destruction of capitalism will allow for new and better business models to replace older ones.
Promises for tomorrow
Despite of all the other interesting new ways of doing business with big data, in my opinion the most important charasteristic of the phenomenon lies in its potential for huge scalability. As the different uses for data diverge and proliferate, the Economist[vi] points to the commodization of big data technology. A company vying for more data processing capabilities no longer has to build its own data centres from scratch, but can use more and more external service-based solutions and open-source software such as Hadoop, the de facto toolkit for big data analysis.
Extrapolating this trend to the future, combined with Moore's law, the exponentially growing stream of data and increasing sophistication of analysis tools, is where the really interesting possibilities lie. Even now a large chunk of data is stored and analysed with software of similar origins and shared between different entities[vii]. If different data sources and analysis platforms become increasingly compatible, co-dependent and interactive, a big data ecosystem is born. The leap from tailored IT services for each individual company towards an increasingly innovative, competitive, information-sharing ecosystem could transform business in the same way that smartphones have transformed the way people connect to each other. This is why big data is a big deal.
iii) The Economist, February 1st 2014, "Measuring health care: Need to know"
iv) Nassim Nicholas Taleb: "Antifragile: Things That Gain From Disorder" (Random House, 2012)
vi) The Economist, May 19th 2012, "Big data"
vii) IDC white paper: "Trends in Enterprise Hadoop Deployments", Ashish Nadkarni and Laura DuBois, 2013 [http://www.redhat.com/rhecm/rest-rhecm/jcr/repository/collaboration/site...