Monday, July 8, 2013

BIG DATA IS COMING TO A CLOUD NEAR YOU!*


Star Trek: The Next Generation fans will wonder whether the two words Big Data are descriptors for a new sentient android of epic proportion, a supersized upgrade of Lieutenant Commander Data.   As an addicted trekkie, sadly I must quickly disabuse you.  Well, maybe I’m wrong.  There are people who consider that Big Data is every bit (?byte) as exciting as the USS Enterprise’s second officer.
Big Data refers to immense data sets that are collected in fields as diverse as astronomy and genomics.  As Wikipedia tells it, “as of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created”, so there is a lot of data about.  The dynamic of Big Data is the search for relationships among these data and teasing out correlations that may not be obvious from the constituent data sets that comprise it. Our technical capacity to search immense data repositories means that correlations can be found in a way never before possible. 

In their new book Big Data: a Revolution that will transform how we Live, Work and Think, Viktor Mayer-Schönberger an Internet governance academic from Oxford and Kenneth Cukier, the data editor of The Economist, recount an interesting example of how Big Data, collected by Google from the three billion search requests it receives each day was used to track influenza in the US 
Google took the 50 million ‘most common search terms used by Americans and compared the list with Centers for Disease Control (CDC) data on the spread of seasonal flu between 2003 and 2008’. After stupendous computer activity, they settled on 45 search terms that were strongly correlated with official figures.  These included many obvious terms such as flu, cough, medications for cough but others that were not so obviously linked.  ‘Unlike CDC, they could tell it in near real time, not a week or two after the fact.’   Although not without their critics and errors, Google flu trends are now available for many countries. http://google.about.com/od/experimentalgoogletools/qt/GoogleFluTrends.htm
Mayer-Schönberger and Cukier accept that there is no universally-accepted definition of Big Data, but rather see the term referring to ‘things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organisations, the relationship between citizens and government, and more.’
Our capacity to collect, link and analyse data electronically is growing exponentially.  Mayer-Schönberger and Cukier draw a parallel between the present and the era that followed the invention of the Guttenberg printing press around 1439.  In the half century starting 1453, they quote an estimate that eight million books were printed, ‘more than all the scribes of Europe had produced since the founding of Constantinople 1,200 years earlier.’ In 2003, following a decade of effort, the human genome was sequenced.  ‘Now… a single facility can sequence that much DNA in a day.’  And because Big Data includes all the data available, population samples will no longer be needed in the way they are today and the work of statisticians will be redefined.
There are many features of Big Data to ponder for medicine.  How will we practise with more information about correlation and less about causation? If Big Data shows that people who take regular exercise have better cancer survival, what will we advise our patients?  Is the correlation sufficient to advise them to exercise, even though the causal pathway is not known?  This will increase our need, and that of our patients, to live with uncertainty.   What meaning does privacy and even confidentiality have in this new age?  We should surely be thinking and discussing these things now.  

(Potential conflict of interest: SL’s son Nick leads Google France.)

*Previously published in MJA Insight on June 24, 2013