How to estimate causal effects with social science data?
Two examples:
breakfast cereal example
health savings experiment
Propose a theoretical model (that includes a causal mechanism) to explain the observation.
Suppose our causal mechanism is that this cereal sticks to a magnet because of its iron content.
A testable hypothesis: cereals that are high in iron stick to a magnet.
Unit of analysis: cereal type
Treatment variable (causal variable of interest) T: iron content (high/low)
Treatment group (treated units): Total cereal
Control group (untreated units): Honey Smacks cereal
Outcome variable (response variable) Y: whether it sticks to a magnet (yes/no).
Counterfactual: Would Total cereal stick to a magnet if it did not have iron?
Two potential outcomes: Y(1) and Y(0)
Causal effect: Y(1)−Y(0)
Fundamental problem of causal inference: only one of the two potential outcomes is observable.
Cannot calculate individual causal effect Y(1)−Y(0)!
Importance of control group.
Association is not causation! (there could be confounders)
Confounding variable is a pre-preatment variable that is associated with both the treatment and the outcome and may bias our estimation of the treatment effect.
Matching: Find a unit that is the same as the those in the treatment group, except for the treatment.
Is Honey Smacks a good match for Total cereal?
Association is not causation! (there could be confounders)
Confounding variable is a pre-preatment variable that is associated with both the treatment and the outcome and may bias our estimation of the treatment effect.
Matching: Find a unit that is the same as the those in the treatment group, except for the treatment.
Is Honey Smacks a good match for Total cereal?
Find another healthy cereal that has the same ingredients (other than iron), texture, etc.
NJ increased the minimum wage. Does increase in min wage lead to unemployment?
Find a similar state to NJ that did not increase min wage.
Match to rule out potential confounders: e.g., compare only Burger Kings in urban areas
Are Black people less likely to get job offers?
Cannot match on everything
Unobserved confounders: variables associated with treatment and outcome
Selection bias is confounding bias due to participant self-selection into the treatment/control groups.
Can you think of examples?
Suppose we are interested in the causal effect of alcohol on depression
Propose a research design, such that:
units in the treatment group engage in high alcohol consumption, while units in the control group do not
the treatment and the control group do not differ in other ways that may be correlated with alcohol consumption and depression
units did not self-select into treatment/control groups
Random assignment of participants (observations) by the researcher into the treatment/control groups.
Key idea: Randomization of the treatment makes the treatment and control groups “identical” on average
The two groups are similar in terms of all (both observed and unobserved) characteristics
Can attribute the average differences in outcome to the difference in the treatment
Sample Average Treatment Effect (SATE)=1nn∑i=1{Yi(1)−Yi(0)}
SATE is not observable, but can estimate as ¯Y(1)−¯Y(0)
Randomized experiments are the gold standard
Double-blind experiments: Placebo effects and Hawthorn effects
Hawthorn effect is changing behavior because you are being studied.
Question: How to encourage people to save for emergency healthcare?1
A small amount of investments in preventative health products (e.g., bed nets, water filters) can save many lives in developing countries
Hypothesis: simple saving technologies can increase investments
A randomized field experiment in Kenya
Outcome: amount of savings for health products 6 and 12 months later
1 Dupas, Pascaline and Jonathan Robinson. 2013. “Why Don’t the Poor Save More? Evidence from Health Savings Experiments.” American Economic Review, Vol. 103, No. 4, pp. 1138-1171.
Randomized treatment
Outcome measured in follow-up surveys after 6 and 12 months
Why have a control group rather than compare the treatment groups to the savings rates in the population?
Does the drop-out rate differ across the treatment conditions? What does this result suggest about the internal and external validity of this study?
Internal validity--- the extent to which causal assumptions are satisfied in the study.
External validity---the extent to which the conclusions can be generalized beyond a particular study.
Control: ¯Y(0)=257.83
Lockbox: ¯YT1(1)=307.83
Safe box: ¯YT2(1)=408.22
SATET1=307.83−257.83=50
SATET2=408.22−257.83=150.39
Gender
control | lockbox | safebox |
---|---|---|
0.73 | 0.73 | 0.79 |
Age
control | lockbox | safebox |
---|---|---|
41.87 | 39.58 | 38.54 |
Marital Status
control | lockbox | safebox |
---|---|---|
0.75 | 0.76 | 0.73 |
Married Women Only
control | lockbox | safebox |
---|---|---|
239.66 | 332.43 | 557.14 |
Unmarried Women Only
control | lockbox | safebox |
---|---|---|
218.54 | 220.47 | 264.04 |
How to generate a research question (an inductive approach)
How to propose a theoretical model (causal mechanism) and derive a testable hypothesis
Concepts: unit of analysis, treatment/control variable, treatment/control group, outcome variable, counterfactual, potential outcomes, causal effect, confounders, matching, selection bias, random assignment, SATE, randomized experiment, placebo effect, hawthorn effect, attrition/drop-out rate, internal and external validity, balance, subset analysis.
What was the goal of the cereal example? What did it demonstrate?
How does randomization account for confounders? What is the exact mechanism?
What is the difference between a natural experiment (quasi-experiment) and a randomized controlled trials?
What is selection effect and how do randomized controlled trials rule it out?
Install R and RStudio
Complete assigned readings
Next Class: causality, experimental design, a two-sample t-test
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |