When designing comparative studies, epidemiologists aim for exchangeability between treatment groups. The people who receive Treatment A overall must have the same traits as the people who receive Treatment B. When they do, they are exchangeable populations, and differences in health outcomes can be attributed to the treatments we are comparing.  

In the real world (i.e., outside of randomized controlled trials), the treatment a patient receives depends on a wide array of factors: age, gender, medical history, physician preference, patient preference, lifestyle, and more. These factors are confounders; they prevent us from achieving exchangeability if we leave them unaddressed.

For example, patients with uncontrolled diabetes are more likely to be prescribed a second-line therapy, such as an SLGT-2 inhibitor or sulfonylurea, in addition to metformin, as opposed to being prescribed metformin alone. As a result, poorer outcomes associated with that second-line therapy in combination with metformin compared to metformin alone may be due to the fact that it was given to patients with poorer health, rather than to the drug itself.

Our goal in designing a study, then, becomes balancing these factors between our treatment populations so that when we observe a difference in outcomes, we can attribute that difference to the treatments alone.

As epidemiologists working with real-world data, we have a toolkit filled with different study designs and analytical methods we can draw from to take all of this confounding information into account so that we can achieve exchangeability. And one of the most useful tools in the kit is the propensity score.

Calculated for every patient, a propensity score estimates the probability that each will receive the treatment of interest based on all of the possible confounders that we have taken into account. Because it is a probability, the propensity score ranges in value from 0 to 1: 0 meaning the patient is 0 percent likely to receive the treatment, 1 meaning the patient is 100 percent likely to receive the treatment.

If we consider the example of comparing second-line diabetes therapy users to metformin-only users, the probability of receiving a second-line therapy will be calculated for all patients in both treatment groups based on included confounders.

The resulting score can then be used in a variety of ways to control for confounding. For example, we can match patients between treatment groups based on propensity scores. A patient within the second-line therapy group with a propensity score of 0.6 will be matched to a patient within the metformin-only group with a propensity score of 0.6. Both patients theoretically have a 60 percent chance of receiving the second-line therapy based on the confounders included, but only one patient does. This process is then repeated for all patients until all possible matches are made between the groups. At the population level, this technique results in balanced confounders between the treatment groups. This creates exchangeable populations where causal conclusions about treatment effects can be made.

This creates exchangeable populations where causal conclusions about treatment effects can be made.

A successful propensity score analysis depends on having adequate data on confounders. With the wealth of information we are offered in administrative claims or EHR data, we are able to include many confounders within the propensity score, thus achieving better exchangeability between our two populations. Sebastian Schneeweiss, M.D., Sc.D., and colleagues developed and tested the use of high-dimensional propensity scores (hdPS) to empirically identify and select potentially hundreds of confounders from the data source to include within the propensity score. This approach for propensity scores allows researchers to capture confounders and proxies for confounders, and results in even further improved exchangeability between treatment populations.

Due to the critical importance of exchangeability, providing transparency on whether and how it was achieved is a necessary part of any comparative study using real-world data. Transparency requires:

  1. Reporting details on the statistical methods used for propensity score modeling;
  2. Listing confounders included in the model;
  3. Defining how those confounders were measured; and
  4. Stating the confounder distributions within each group before and after propensity score matching.

Determining that appropriate confounders were analyzed and controlled for is vital for regulators and other decision-makers when evaluating the validity of a given study. And only then can observational data enable us to draw causal conclusions about treatments and how they affect our health.