The COVID-19 Evidence Accelerator, organized by the Reagan-Udall Foundation for the FDA and Friends of Cancer Research (Friends), convenes leaders in the health care data and analytics space to answer key questions about COVID-19 with real-world data (RWD). 

By banding these organizations together, the project aims to create a shared learning environment that maximizes the impact of members’ individual efforts. Together, via dedicated therapeutics and diagnostics workgroups, participants aim to advance the methods used to turn a range of RWD into actionable COVID-19 insights. 

The first set of analyses for the COVID-19 Therapeutics Evidence Accelerator’s Parallel Analysis Workgroup focuses on the research question: Among patients with COVID-19 infection and treated with hydroxychloroquine in an inpatient setting, what are typical treatment patterns, co-medications, and effects on select health outcomes? Several organizations investigated this research question in parallel using their own RWD, then came together to discuss findings and challenges. 

As this initial work nears completion, the group has made significant progress toward developing analytic methods and strategies to apply to potential interventions—both existing and future. While the FDA has since revoked hydroxychloroquine’s Emergency Use Authorization, Jeff Allen, Ph.D., President and CEO of Friends and co-lead for the Evidence Accelerator, calls this work a “critical component” of the effort to better understand COVID-19. These early strides shed light on RWD’s prospects in answering important COVID-19 questions, and, importantly, inform a master protocol to speed analyses across a wide range of research questions. 

Recently, the Parallel Analysis Workgroup met to discuss takeaways from this first question set. Carla Rodriguez-Watson, Ph.D., Scientific Director of the Reagan-Udall Foundation’s IMEDS program and co-lead for the Evidence Accelerator, noted “the organizations were enthusiastic about working together, and were willing to share some of the insights they had gathered along the way, such as how to manage inconsistent coding of mechanical ventilation and coding algorithms to identify medications of interest.”

Below, we summarize three of the early learnings, brought forth by teams from Syapse, COTA Healthcare/Hackensack Meridian Health (HMH), Health Catalyst, Dascena, TriNetX, and Aetion/HealthVerity (see here for more details).

  1. Researchers must first understand the nuances of the data before beginning an analysis.
    Before beginning any RWD analysis, it is important to first gain a full understanding of the data. This is particularly urgent, and challenging, with COVID-19, given the rapidly evolving nature of the disease and the need to access data as they accrue. 

    Participants encountered different nuances among the different data sources. As Syapse researchers analyzed COVID-19 cases in their hospital-based oncology data, they realized they needed more detailed outcomes data, including patient mortality and discharge status, to glean the needed insights. Health Catalyst researchers found in their health system data that patients often received multiple medications during a hospitalization, and that these were documented as separate administrations. They needed to understand how unique medication administration was captured in the data so they could properly account for it in the analysis. Aetion researchers worked with HealthVerity using their proprietary de-identification and linking methodologies to link databases containing medical and pharmacy claims, electronic health records, and labs and hospital chargemaster (inpatient) data. Upon recognizing multiple ways to identify baseline comorbidities and outcomes, Aetion worked with HealthVerity to clarify uncertainties in the data and to select the subgroups most appropriate for the analysis.

    As the participants continue to acquaint themselves with these datasets and others, Susan C. Winckler, R.Ph., Esq., CEO of the Reagan-Udall Foundation, posited data exploration as a critical step in laying the groundwork for future analyses: “Just getting comfortable with the dataset, the patients, and the results is part of what we’ll need as we move forward with other question sets.”
  2. Researchers must identify and address confounding specific to COVID-19 data.
    As always, it is important to address confounding in the data in order to compare groups in an analysis. With COVID-19, however, much of the natural history is not yet known, and researchers must rely on their best judgement when seeking evidence of confounding. With these parallel analyses, the participants experienced this challenge as they adjusted for common confounding factors in COVID-19 data.

    Age, for example, is an important driver of overall survival for people with COVID-19, but those of increased age are also more likely to have more comorbidities. The COTA and HMH team, which used COTA’s data and analytics expertise to analyze HMH’s COVID-19 data, was tasked with understanding the true driver of treatment decisions and overall survival outcomes: Was it COVID-19, or was it something else?

    This research team used different propensity score matching methods to account for these confounding factors. The data initially showed differences in mortality between the hydroxychloroquine-treated and untreated patients, but once they adjusted for the comorbidities they found the differences in mortality between the groups lessened.
  3. Researchers will continue to navigate inconsistencies in coding and reporting in COVID-19 data.
    As the health care industry learns more about COVID-19 and the disease is better characterized, the landscape of COVID data continues to evolve. As a result, participants learned that it is important to stay vigilant in using these rapidly growing datasets, as reporting and outcomes can vary within and across them over time.

    Some groups grappled with inconsistent presentation of COVID-19 in the data: Syapse, for example, saw inconsistent coding for the use of mechanical ventilation, so they employed natural language processing to gather clarifying data from physician notes.

    Health Catalyst researchers learned that defining the hydroxychloroquine- or azithromycin-treated population isn’t as straightforward as it sounds. For example, a researcher needs to first know the date of diagnosis as well as the date of hospitalization. If the presumed diagnosis and the COVID-confirming lab results did not occur on the same date, which do you consider the diagnosis date?

    Another challenge Health Catalyst identified was in navigating “presumed positive” COVID-19 cases. In some instances, a positive lab test result made it clear whether there was a COVID diagnosis, but others were “presumed positive” based on the symptoms and diagnoses a patient presented. To clarify, Health Catalyst researchers designed a scoring system to determine confidence levels in “presumed positive” cases, with their own ICD-10 codes to identify cases in the data that were updated as new symptoms, such as loss of taste and smell, were defined. They ranked a patient’s “presumed positive” designation on a scale of high to low confidence depending on the number of symptoms or diagnoses they presented.

Looking ahead to future analyses
Ellen Sigal, Ph.D., Chairperson of both Friends and the Reagan-Udall Foundation, underscored the importance of the combined efforts, and the concerted focus on understanding the data and their imperfections: “We’re going to solve this problem much faster and with more accuracy as a community.”

As the group finalizes the hydroxychloroquine results, and then quickly shifts to remdesivir and heparin analyses, we can apply these learnings to better understand COVID-19, as well as how best to approach the data, analyses, and resultant RWE in furthering this understanding. 

In the coming months, we’ll continue to publish learnings from the COVID-19 Evidence Accelerator on the Evidence Hub; subscribe to the Evidence Digest below to stay up to date.