Machine learning turns raw data into real knowledge

We are in the age of big data. The number of biomedical variables that can be measured has exploded as technology advances. It is easy for clinicians to be overwhelmed, so tools are needed to help interpretation. At the World Congress of Neurology 2019, Professor Sergio Baranzini (UCSF Weill Institute for Neurosciences, San Francisco, USA) and Professor Dina Katabi (MIT, Cambridge, USA) showed how machine learning approaches can turn raw data into useful knowledge.

More data were generated in 2017 than in the rest of human history combined. The total stored digital data worldwide in 2018 was estimated at 33 zettabytes (1 zettabyte = 10⁹ terabytes = 10²¹ bytes).¹

But data on its own is useless. Adding context, such as categories, turns data into usable information. The interaction and relationship between sets of information then becomes knowledge.

Complex issue

Biomedicine has arguably the biggest data problem of any sector. Data can be collected from the genome, transcriptome, proteome, biochemical pathways, cellular processes, and physiological processes.

Research has been successful at understanding each of these aspects individually. But they interact in a highly complex and non-linear way, which makes it difficult to make predictions for the complete system.

Biomedicine has the biggest data problem of any sector

Both speakers are developing approaches to help clinicians use data to track and predict the course of disease in each patient using real-time monitoring.

Harnessing AI

Professor Baranzini presented a “database of databases” called Scalable Precision medicine Oriented Knowledge Engine (SPOKE) that uses publicly-available population-level biomedical data.

Artificial intelligence computes the probability that any two concepts in the data are related. The whole dataset is then mapped into a network of connected nodes to show new relationships. For example, the network path between a disease and a medication normally used for a different disease could open up new treatment possibilities.

Characteristic heat maps

The capabilities of SPOKE have expanded with the addition of 1 million anonymized electronic health records from the UCSF Information Commons.²

An algorithm embeds each patient in the network and maps all their available data, such as genotyping, lab tests, and diagnoses. The timing of each observation is taken into account. Machine learning approaches create a 2D heat map for each patient containing about 130,000 pixels, where every pixel contains defined information.

Data can define a unique heat map for each disease

Heat maps are also created for diseases. This allows vast amounts of data to be compared easily to identify trends. For example, the heat map for post-traumatic stress disorder (PTSD) was most similar to bipolar disorder, nicotine dependence, and intracranial aneurysm.

Adding value

The power of this approach is demonstrated with multiple sclerosis (MS), a condition with complicated interactions between genetics and environment.

The relationship between the gut microbiome and the host immune system may be a key factor in MS. The SPOKE database included this information for 5000 patients with MS, which was used to create heat maps for subtypes of the disease.

The resulting heat maps were able to distinguish between MS subtypes. For example, patients with primary progressive MS (PPMS) had more genetic nodes. This aligns with studies showing that genetics has a stronger influence in PPMS than other subtypes.

Passive monitoring

Some conditions, such as Alzheimer’s disease, may require continuous patient monitoring. Wearable devices exist, but these may not be convenient or could cause distress to patients with dementia.

Professor Katabi introduced the concept of ‘invisibles’ – technology that enables passive monitoring of vulnerable patients without the same personal impact as wearables.

In this case, an electromagnetic (EM) signal of about a thousandth of the power of WiFi is emitted by a small device installed in the patient’s living space. Importantly, the signal does not interact with other EM fields, such as cellphone signals, or medical devices, such as pacemakers.

‘Invisibles’ monitor behavior without disturbing the patient

Almost everything a person does changes the shape of the returned signal detected by the box. As EM partially passes through the body, this includes internal processes such as the heartbeat. Machine learning was used to connect the signal characteristics with corresponding patient actions and physiology.

Help for carers

Patients with the same disease can have very different symptoms. Invisibles can act as “digital biomarkers” to help carers to address individual needs more effectively.

The directionality of the signal can be combined with a map of the living space. This context is important. For example, if the patient is lying in bed, they are likely asleep. But if they are lying on the floor in the kitchen, they may have fallen. The system can then alert carers.

Digital biomarkers improve the care of vulnerable patients

Long-term trends also inform care. For example, cumulative data could show that a patient spends most of their day sitting. This patient could then be encouraged to exercise more often.

Assessment without distress

Sleep is often disrupted in neurodegenerative disease. Sleep monitoring is normally performed in a sleep lab by covering the patient’s head in electroencephalogram (EEG) sensors. This may cause anxiety and the results might not reflect the patient’s normal sleep. Professor Katabi’s device can distinguish sleep stages with 80% accuracy.

More data were generated in 2017 than in the rest of human history combined

Disease severity scales could also be applied accurately. Professor Katabi conducted a pilot study with 7 Parkinson’s disease patients with a focus on gait, home activity and time in bed.³ The walking trajectory of control subjects was smooth, but Parkinson’s disease patients were slower with a “wiggly” trajectory. This could indicate how severe a patient’s balance issues are, and whether they improve with medication. Additional studies in a larger cohort will be key to validate these assessments.

Channels

Choose a channel

Go to Progress in Mind

Machine learning turns raw data into real knowledge

Channels

Choose a channel

Go to Progress in Mind

Select country

Machine learning turns raw data into real knowledge