An Introduction to a Roadmap for Causal Inference

Close up hands putting pins to destination on world map.


At Evidence for Action (E4A), we fund research on the causal effects of programs and policies on health. Causal questions are distinct from associational or descriptive questions. Answering causal questions requires us to rule out competing explanations for why an intervention, treatment, or exposure is associated with a health outcome. Knowing what interventions, policies, or systems have causal impacts allows us to identify the strategies to target that are most likely to improve population health. This is especially true for identifying policies that affect racial equity. In prior blog posts, we have written about research designs and methods for determining causal effects (see here, here, and here).

In this post, we highlight a general roadmap for answering causal questions. Sometimes the roadmap reveals that the research original question cannot be answered with the available data and guides us to specify a research question that is “as close as possible” to the motivating causal question (see roadmap paper). We think this roadmap, originally developed by biostatisticians at UC Berkeley, is a useful framework for approaching causal questions. It breaks down the process of asking and answering causal questions into seven steps that clearly separate causal modeling practices from the statistical estimation of causal effects. The seven steps in the roadmap are presented in the next section. In a linked Methods Note, we provide a discussion of how the roadmap applies to answering causal questions that are relevant to E4A funded research.

Putting Evidence into Practice: A Roadmap for Asking and Answering Causal Questions

The causal roadmap focuses on delineating the steps and assumptions necessary to make causal inferences or answer causal questions. The steps in the roadmap are agnostic to the tools/methods used to derive causal inferences. Instead, the roadmap offers clarity on how to use these traditional causal inference tools to make clear the assumptions and evidence that support answering causal questions.

The seven steps in the general roadmap for causal inference are listed below. To situate the roadmap steps in practice, we draw on an example E4A funded study that can be viewed through the roadmap: Impact of Greening on Cardiovascular Disease (CVD) in Low-Income Miami Neighborhoods. In the Methods Note for this article, we provide additional details on the steps of the roadmap and this evaluation of greening at each step in the causal roadmap.

  1. Specify knowledge about the system to be studied using a causal model: What do we already know? For example, prior research reports that higher neighborhood greenness is associated with lower rates of cardiovascular disease (CVD) among neighborhood residents, but several potential confounders are likely to bias a simple associational analysis. For example, neighborhood income, crime, racial/ethnic segregation and discrimination, and density of elderly population might also influence both greening rates and CVD. In the current study, the investigators knew that variations in neighborhood greenness were being induced by city and county policies that arbitrarily selected certain low-income neighborhoods for greening initiatives.
  2. Specify the observed data and how it arose: What data will be used or collected for this study? What variables will be available in this data? For whom are these variables measured? The greening evaluation used observed data on greening/greenness from satellite records and incident CVD recorded in Medicare beneficiary claims for people who resided in low-income Miami-Dade County census blocks. An advantage of Medicare data is that there is less selection into participation than in data from research studies that require participant enrollment.
  3. Specify what you want to learn, i.e., what causal effect are you trying to estimate? The investigators sought to evaluate the ratio of the risk of incident CVD that would occur if all neighborhoods were exposed to a greening intervention compared to risk of incident CVD that would occur if no neighborhoods had been exposed to a greening intervention.
  4. Assess whether, given what you already know, and the data you have available, it is possible to draw causal connections between the intervention and the causal effect you specified in step 3? To validly estimate the effect of a greening intervention, the critical assumption (called “exchangeability”) is that, before the greening intervention, the CVD risk of people who live in neighborhoods that receive a greening intervention did not differ on average from the CVD risk of people who live in neighborhoods that did not receive an intervention, after accounting for the measured covariates (e.g., neighborhood income, crime, race/ethnicity, and age). In other words, we must assume that the measured potential confounders that can be controlled in the analysis are sufficient to account for all of the confounders of the greening - CVD relationship. We must also assume that the individuals captured in the study data are representative of all individuals in the target population (i.e. the population about whom we would like to make inferences) and that the intervention of “greening” is clearly defined.
  5. Revisit your assumptions (step 1), available data (step 2), and causal question (step 3) until you have settled on a question that can be answered given what you already know and the data you have available. Commit to a specific causal effect measure and statistical model representing the knowledge available to you. The assumption of exchangeability seems plausible because the neighborhoods were chosen for greening interventions quasi-randomly. The causal effect can then be estimated with a statistical model by contrasting the observed CVD incidence in neighborhoods with consistently low greenness against the CVD incidence in neighborhoods where the greening intervention was fielded, changing greenness from low to high, conditional on measured covariates. Note this step explains how we estimate a causal contrast (which is what we want to know but is not directly observable) from a statistical contrast (which is directly observable in the data available).
  6. Estimate. The investigators chose to use a generalized linear multilevel model with a Poisson link function to estimate the relative risk of incident CVD in neighborhoods within which greenness increased from low to high during the follow-up period compared to neighborhoods in which greenness remained low during the follow-up. The investigators planned to estimate the multilevel Poisson model in SAS.
  7. Interpret. The greening study is still underway, so there are no results to interpret yet. At this step, it will be important to clearly state the assumptions that need to be made to interpret the statistical estimate as the causal effect of a greening on CVD incidence.

Tools and Resources

We hope that the roadmap and method note helps researchers navigate to causal inferences! Below is a link to the causal roadmap paper as well as an introductory article on structural causal models.

Blog posts

About the author(s)

Dakota W. Cintron, PhD, EdM, MS, and Ellicott Matthay, PhD, MPH, are postdoctoral scholars for the E4A Methods Laboratory and frequent contributors to the E4A Blog. 

Maria Glymour, ScD, MS, is an E4A Associate Director and leads the E4A Methods Laboratory.

Stay Connected