How big data is being used in healthcare today

Uri Laserson, Data Scientist, Cloudera.

21 November 2014

We are being bombarded from all directions with information about "Big Data" and how it will change our lives, especially as it pertains to our health. But what is Big Data, and how is it actually being used in the healthcare system today?

More than anything, Big Data represents a change in perspective. Organizations have long understood the value in storing and analysing their data. What makes Big Data different is the realization that it can be highly valuable to store and analyse literally every bit of information that can be stored and analysed.

Furthermore, the rise of Big Data has been enabled through the development of a democratizing ecosystem of tools for processing such large quantities: Hadoop. In the past, the ability to analyse petabytes of data was inaccessible to all but the richest, most sophisticated organizations (like Google). But the rise of cloud computing and the Hadoop ecosystem have enabled even small groups of people to record, store, and process petabytes of data.

However, lots of popular discussion around Big Data tends to be high level and abstract, so below we describe a few specific uses of the Big Data mind-set that are actually being developed/deployed.

The first example concerns sepsis, which is an often fatal condition caused by a strong inflammatory response to an infection, typically during hospitalizations. Early detection of sepsis is crucial for increasing the chances of survival, which requires constant monitoring of the patients. While most hospitalized patients are connected to continuous monitoring systems (including vital signs like heart rate, oxygen saturation, and body temperature), this data is generally not stored/analysed in bulk for the long term.

Today, one large hospital chain is using the Hadoop ecosystem to collect such continuous monitoring data to train predictive models for a patient's risk of developing sepsis over the coming hour. These models are then deployed using another Hadoop system to continuously score the model to generate real-time predictions for each hospitalized patient, enabling septic patients to be treated as early as possible.

Another major example concerns health insurers' attempts at controlling spiralling healthcare costs in the US, where a small fraction of individuals is responsible for a disproportionately large fraction of healthcare spending. Preventative health is one of the pillars of reducing healthcare costs, so insurers are using a diverse array of data to train predictive models that attempt to classify which policy-holders will need the most attention.

Specifically, the insurers are using large troves of claims data, supplemented with anything they can access, including consumer data, social media data, and even giving out quantified self-monitoring devices, like FitBits. By training such classification models, the insurers can identify and proactively address the patients with the most need, simultaneously providing better care and reducing costs to the healthcare system at large.

The number of use cases actively being developed is large and diverse. EMR companies are storing their medical records in a centralized Hadoop cluster, allowing for population-wide analyses. Pharmaceutical companies are instrumenting their vaccine manufacturing facilities, to help improve efficiencies and yields. Hospitals are tracking large amounts of prescription data, to help fight drug abuse and fraud. Indeed, the attitude that all data can be valuable coupled with the empowerment of the Hadoop ecosystem is already having a significant impact on the quality and efficiency of our healthcare system.

Protecting critical healthcare data in the era of 'big data'

In cardiology, Big Data covers the ‘whole’ patient