Many social science concepts are difficult or impossible to measure
Two distinct approaches:
Can use a proxy variable in lieu of the actual variable of interest
Can use an instrument variable to help isolate the true effect of our variable (when our variable is endogenous, i. e. correlated with an unobserved variable that also affects y)
Consider the following model of log(wage):
log(wage)=β0+β1educ+β2exper+β3abil+u
Must account for ability, which is an unobserved variable.
Since ability is likely correlated with education and experience, excluding it will lead to biased estimates of β1 and β2.
It may be possible to avoid omitted variable bias by using a proxy variable.
A proxy variable is a variable that is correlated with the unobserved variable (e.g. IQ is a proxy for ability)
The proxy variable does not affect y, it has a correlation with y solely due to its correlation with the unobserved variable for which it is a proxy for.
The proxy variable captures all the correlation between the variable it stands for and other variables that affect y.
Discuss whether the assumptions from the previous slide hold for each of these scenarios:
Suppose we want to model trade between pairs of countries as a function of each country's GDP and population, as well as the distance between them. The problem is that GDP data are not available prior to 1950, so we decided to use country's energy consumption as a proxy variable.
Suppose we want to model an individual's decision to vote in an election as a function of their income, education, age, and the opportunity cost of time. Unfortunately, we do not have a good measure of the opportunity cost of time, so we proxy it by the quality of Monday night football game the night before the election.
Scenario 1:
Scenario 2:
Matthew Potoski and R Urbatsch. 2017. "Entertainment and the opportunity cost of civic participation: Monday night football game quality suppresses turnout in US elections." The Journal of Politics, 79(2):424--438.
Emily Hencken Ritter and Courtenay R Conrad. 2016. "Preventing and responding to dissent: The observational challenges of explaining strategic repression." American Political Science Review, 110(1):85--99.
Why do people vote? What are the costs and benefits of voting?
How do Monday Night Football games alter the costs and benefits of voting? What is the causal mechanism?
How do the authors measure the quality of the game?
Why does it matter if a local team plays in a pre-election game? How do the authors measure this? Why do the authors say that it is a noisy measure?
The quality of the game is an index that combines three measures—team records, point spread, and over/under
Anyone from the same census-defined metropolitan area as the playing teams to be “local.”
How does interest in politics interact with the opportunity costs of voting? Why? How do the authors measure interest in politics? Why do they need to use several alternative measures?
How does the start time of the game affect turnout?
What are the placebo tests that the authors use? Why do they need these? What do they show?
Interest in politics responses may be subject to social-desirability bias and is unavailable for a few years of the NES surveys. An alternative measure of political interest, sidestepping these drawbacks of the direct measure, is intensity of partisan identification.
Placebo tests: whether MNF game quality on the evening before the election influences either early/absentee voting or voter registration in states that do not allow it.
Turnout=β0+β1Benefit+β2Cost+u, where β0, β1, and β2 are population parameters, and u is the random error.
MNF game quality is a proxy for cost if two assumptions are met:
MNF game quality only affects voting in as much as it increases the opportunity cost of time.
Opportunity cost of time is uncorrelated with other variables that are included in the model, once we account for the quality of the pre-election game.
What do the authors mean when they say that the relationship between repression and dissent is endogenous?
What are the two strategic censoring processes that result in the observed outcomes of dissent and repression?
Why is rainfall correlated with dissent?
Does rainfall affect repression?
What is the Law of Coercive Responsiveness?
Why do governments repress?
What are the alternatives to repression?
How does repression prevent dissent?
Why do some groups dissent despite the expectation of repression? Why do some groups dissent despite repression?
If preventive repression deters some dissent, why don't governments always engage in preventive repression?
Why do the authors use a natural log of daily rainfall rather than rainfall deviations from the norm? Why do they include an additional instrument of percent of annual rainfall the daily amount represents?
Suppose we want to know the effect of X on Y , but we think that Y may also, in part, determine X;
In this case, we say that X and Y are endogenous;
Very common in social sciences;
Examples: Conflict and alliances, armes races, democracy, trade; supply and demand; domestic institutions (political, legal, economic) and dissent.
Repression=β0+β1Dissent+u,
where β0, and β1 are population parameters to estimate, and u is the random error.
Problem: Dissent is correlated with an unobservable variable that affects repression, Protesters' Resolve, which means we cannot obtain an unbiased estimate of β1.
Solution: Use an instrumental variable, .
[1] Regress Dissent on Rainfall.
Dissent=δ0+δ1Rain+ν,
[2] Regress Repression on the fitted values from the first equation:
Repression=β0+β1ˆDissent+u
Suppose our model is:
y=β0+β1x+u, where we think that x and u are correlated, i. e. Cov(x,u)≠0.
A valid instrument for x is a variable z, such that:
z is uncorrelated with u, or Cov(z,u)=0;
This condition is known as instrument exogeneity
I. e., z is only correlated with y because it is correlated with x; z does not have its own partial effect on y after we account for x and other controls.
z is correlated with x, or Cov(x,z)≠0.
This condition is called instrument relevance.
We can, and should, test this assumption by regressing x on z.
Suppose we are interested in the effect of education on income. Why can't we simply estimate this effect using OLS?
Load the mroz data from the wooldridge package.
library(tidyverse)library(wooldridge)data(mroz) mydata<-mroz %>% filter(!is.na(lwage))summary(m1<-lm(lwage~educ, data=mydata))
## ## Call:## lm(formula = lwage ~ educ, data = mydata)## ## Residuals:## Min 1Q Median 3Q Max ## -3.10256 -0.31473 0.06434 0.40081 2.10029 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.1852 0.1852 -1.000 0.318 ## educ 0.1086 0.0144 7.545 2.76e-13 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.68 on 426 degrees of freedom## Multiple R-squared: 0.1179, Adjusted R-squared: 0.1158 ## F-statistic: 56.93 on 1 and 426 DF, p-value: 2.761e-13
Regress educ
on fatheduc
to check the instrument relevance.
summary(m2<-lm(educ~fatheduc, data=mydata))
## ## Call:## lm(formula = educ ~ fatheduc, data = mydata)## ## Residuals:## Min 1Q Median 3Q Max ## -8.4704 -1.1231 -0.1231 0.9546 5.9546 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 10.23705 0.27594 37.099 <2e-16 ***## fatheduc 0.26944 0.02859 9.426 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 2.081 on 426 degrees of freedom## Multiple R-squared: 0.1726, Adjusted R-squared: 0.1706 ## F-statistic: 88.84 on 1 and 426 DF, p-value: < 2.2e-16
summary(m3<-lm(lwage~m2$fitted.values, data=mydata))
## ## Call:## lm(formula = lwage ~ m2$fitted.values, data = mydata)## ## Residuals:## Min 1Q Median 3Q Max ## -3.2126 -0.3763 0.0563 0.4173 2.0604 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 0.44110 0.46711 0.944 0.346## m2$fitted.values 0.05917 0.03680 1.608 0.109## ## Residual standard error: 0.7219 on 426 degrees of freedom## Multiple R-squared: 0.006034, Adjusted R-squared: 0.003701 ## F-statistic: 2.586 on 1 and 426 DF, p-value: 0.1086
Evaluate the validity of each of the following instruments:
Rain as an instrument of election turnout if the DV is Democrats' vote share in a US election;
Rain as an instrument of protests if the DV is government repression;
Mountains as an instrument for governmemt's ability to detect and track rebel groups if the DV is civil war.
IQ as an instrument for education if the DV is wage.
Distance from a four-year college as an instrument for education if the DV is wage.
Sibling's education as an instrument for education if the DV is wage.
Cloud coverage as an instrument for drone strikes if the DV is civilian casualties.
Many social science concepts are difficult or impossible to measure
Two distinct approaches:
Can use a proxy variable in lieu of the actual variable of interest
Can use an instrument variable to help isolate the true effect of our variable (when our variable is endogenous, i. e. correlated with an unobserved variable that also affects y)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |