When considering the use of real-world evidence (RWE) for decision-making purposes, we often speak about the input—the data itself—and whether it’s fit-for-purpose. Decision-makers in biopharma and regulatory agencies go a step further when they look at RWE by asking themselves: At the end of the day, can we trace our way from the study results back to the original source data? The ability to do so is called data traceability.

We need clear data traceability because the end product of a real-world data analysis looks quite different from its raw data input. That’s because the raw data input—e.g., tables upon tables of EHRs or claims data—must go through cleaning, transformation, and linkages to become analyzable data. Without tracking each transformation or linkage, it is impossible for decision-makers to feel confident that the approved data are authentically represented in the results of an RWE analysis.

When paired with a clear study design, data traceability underpins the validity of a study. It ensures the transparency required to judge the rigor and scientific merit of the study, and the reproducibility required by other investigators to independently verify the findings.

Established approaches to documenting data traceability can be both laborious and prone to error. For example, an investigator must be diligent in recording each step of a dataset’s journey when switching among systems (e.g., from raw data to a common data model format to SAS-constructed cohorts to R-coded variables). If the investigator does not record a data transformation step somewhere along the line, the data could lose traceability. It also becomes challenging to accurately document the data’s journey if you’re working with a study that slowly evolves over time or relies on previous versions.

One way to ensure data traceability is to complete a study beginning-to-end on a closed system or platform.

In this way, stewardship of the data—from its raw form to the ultimate RWE deliverable—is completed in one environment and automatically documented.

Platforms can provide and document data checks at the first point of transformation, starting with the data load and continuing from the point when the raw data is transformed into analyzable information. Such verifications include rules-based sanity checks: Do the imported datasets meet technical expectations? Do they include all the expected information? Are all patients accounted for? Verification also requires semi-automated validation: Do the imported datasets meet scientific expectations?

Platforms also enable easy interpretation by reviewers through reports that provide, in natural language, complete documentation of how data are put to use in a study. Reviewers can view the version history of a single transformation, and a cumulative version history for all transformations in an analysis.

Data traceability enables reviewers and decision-makers to have more confidence in RWE studies. A clear, consistent approach to data traceability is an important—indeed, essential—step forward in standardizing the use of real-world evidence in decision-making.

Jeremy Rassen, an epidemiologist and computer scientist with nearly 25 years of experience in the science and technology of big data, is co-founder, president, and chief science officer of Aetion.