Informatics Research

Connecting the (Billions of) Dots

If managing your daily e-mail deluge keeps you up at night, consider the plight of Dr. Parsa Mirhaji. As director of clinical research informatics at Einstein and Montefiore, he is trying to integrate every piece of biological, clinical, environmental, and administrative data generated at these two institutions and beyond. What’s more, he’s spearheading an unprecedented effort to link clinical data across 22 regional health networks, medical centers, and research institutes (including Einstein and Montefiore)—a consortium encompassing thousands of studies, millions of patients and billions of bits of data.

Parsa Mirhaji, M.D., Ph.D.
Parsa Mirhaji, M.D., Ph.D.
“Most healthcare data is stored in silos, with separate and incompatible systems for patient records, clinical research findings, genomic data, and so on,” said Dr. Mirhaji, who is also research associate professor of systems & computational biology at Einstein. “This is true within institutions and between institutions. We’re not making full use of our biomedical data, which greatly hinders discovery and innovation.”

Converting Data into Better Outcomes

Dr. Mirhaji and other leaders in the field have devised a solution: build a highly connected “information commons” in which the findings from each research study and each patient encounter are systematically captured, assessed, and—most importantly—used to improve patient care in several possible ways, e.g., administering therapies proven effective for others with similar symptoms and disease characteristics, personalizing therapies that work only for very small groups of patients with certain conditions, or developing totally new therapies.

To achieve these goals, Einstein and Montefiore are creating a “semantic data lake”—a single reservoir that collects all data flowing into the two institutions. But that’s just the start. Dr. Mirhaji’s ultimate objective is not just to gather data but also to define, contextualize, annotate, characterize and index it—the “semantic” aspect of the data lake.

Semanticizing, or giving meaning to, each bit of data should bring research to a higher level—enabling investigators to: make new connections between genomic data and clinical phenomena, link environmental exposures to diseases, develop “smart” applications for clinical decision-making, create personalized therapies and assess community health needs in real time, to list a few examples.

This approach is closely aligned with the recommendations of the President’s Council of Advisors on Science and Technology. Individual pieces of data within a health record are linked to the corresponding consent, authorization and relevant metadata.

Harry Shamoon, M.D.
Harry Shamoon, M.D.
“In this era of precision medicine, there is great potential to customize treatment based on an individual’s genes,” said Dr. Harry Shamoon, associate dean for clinical and translational research at Einstein and director of the Block Institute for Clinical and Translational Research at Einstein and Montefiore. “Solving the puzzle of how to accurately and effectively marry the enormous amount of genomic data with electronic medical records that track clinical care could open up massive opportunities to advance research, and ultimately, improve health outcomes.”

Brian Currie, M.D., M.P.H.
Brian Currie, M.D., M.P.H.
“This is clearly going to change the landscape of clinical research, especially for our patient population,” said Dr. Brian Currie, vice president for medical research at Montefiore and Einstein’s assistant dean for clinical research at Montefiore. “We have an extraordinary amount of data from the 3.5 million patients in our healthcare network. Now we’ll be able to make full use of that data for things like comparative effectiveness research, in which we identify the best treatments for a range of diseases, from heart failure to diabetes to hepatitis C.”

The semantic data lake will also advance the field of predictive modeling. Researchers at Einstein and Montefiore will soon dive into the lake to identify which hospital patients are likely to wind up in the intensive care unit (ICU) and who is most likely to be readmitted after discharge.

“If we can identify these kinds of patients, and then focus our resources on preventing ICU stays and hospital readmissions, there’s a huge potential to improve outcomes and cut the cost of care,” said Dr. Shamoon.

The analytics architecture will make patients, providers and researchers equal partners in using health data to deliver better healthcare and to enhance research and the public good. Patients in particular will have full control over their health records—able to see everything their doctors know about them and to decide who will have access to their records.

In building the data lake, Dr. Mirhaji’s team has adopted an “open” architecture, ensuring that the entire enterprise is both “scalable” (capable of expanding to any size) and “extensible” (able to accommodate new components). “We want to avoid a situation where we’ve made the wrong technical or infrastructure investment that limits what kind of research we can do in the future,” said Dr. Mirhaji.

Great efforts are also being made to make sure the data is secure. Clinical data, for example, will be anonymized before being used for research purposes. Investigators will be able to re-identify patients only under strict protocols—and only to gain their consent for taking part in studies.

Big Data for a Big Region

Dr. Mirhaji with members of the Research Informatics Core.
Dr. Mirhaji with members of the Research Informatics Core.
Einstein and Montefiore’s big data initiatives extend well beyond the Bronx. Both institutions are active members of the New York City Clinical Data Research Network (NYC-CDRN), which is linking clinical researchers throughout the city in an effort to conduct health outcomes studies more efficiently and at lower cost. In just its first year, the NYC-CDRN has already gathered clinical data on more than three million patients, a number expected to double by next year. This mammoth network will give researchers access to larger and more diverse study populations.

“There are many important research questions that can’t be answered by looking at small, unrepresentative cohorts of patients,” said Dr. Mirhaji, who is leading the informatics efforts and technical architecture for NYC-CDRN. “Much of what we know about medicine is based on studies of middle-aged white men. We don’t know much about how to treat kids, women, older adults, and people of other ethnic backgrounds.”

“We’re a minority majority community—70 percent Hispanic and 19 percent African-American,” added Dr. Currie, who is the NYC-CDRN principal investigator at Einstein and Montefiore. “When the FDA approves a drug, it is rarely evaluated in these populations. Through the NYC-CDRN, we can get these disenfranchised communities more involved in clinical studies.”

NYC-CDRN and its 10 sister networks around the country comprise PCORnet—The National Patient-Centered Clinical Research Network. One of PCORnet’s aims is to foster research into some of the 7,000 rare diseases that collectively affect more than 25 million Americans.

“In most areas, a researcher may have access to perhaps 10 or 15 patients with a rare disease, which is usually not enough for a meaningful clinical trial,” said Dr. Currie. “But with PCORnet, a researcher could easily tap into a patient pool many times that size.”

PCORnet will also aid in the study of common diseases, which often require patient enrollments far beyond the scope of a single medical center. For PCORnet’s first research project, announced in May, the various CDRNs will participate in a study of heart-disease patients that compares the safety and effectiveness of low- and high-dose aspirin for preventing heart attack and stroke. Even though doctors have prescribed aspirin to such patients for decades, the ideal dose still isn’t known.

“So much of medicine is anecdotal,” said Dr. Currie. “We don’t have nearly as much data as we would like to inform clinical decision-making. Having access to millions of patients will let us answer many more questions with greater precision and rigor, while would ultimately improve patient outcomes.”

A century ago, the novelist E.M. Forster wrote in Howards End, “Only connect … Live in fragments no longer.” Forster’s subject was “prose” and “passion,” but his advice is well suited to the world of clinical research.

Posted on: Monday, July 13, 2015