POL 304: Using Data to Understand Politics and Society

POL 304: Using Data to Understand Politics and Society
Linear Regression
Olga Chyzh [www.olgachyzh.com]
1 / 24

Today's Agenda

Review of Section 4.3 of QSS Chapter 4
- Regression and causality
- Regression with multiple predictors (categorical IV)

2 / 24

Which Person is the More Competent?

2004 Wisconsin Senate Race

3 / 24

Which Person is the More Competent?

2004 Wisconsin Senate Race
Russ Feingold (D) 55% vs. Tim Micheles (R) 44%

3 / 24

4 / 24

Best Fit Line

5 / 24

Linear Regression Model

Model

$\begin{array}{rcl} Y & = & \underset{intercept}{\underset{⏟}{α}} + \underset{slope}{\underset{⏟}{β}} X + \underset{error term}{\underset{⏟}{ϵ}} \end{array}$

$Y$ : dependent/outcome/response variable
$X$ : independent/explanatory variable, predictor
$(α, β)$ : coefficients (parameters of the model)
$ϵ$ : unobserved error/disturbance term (mean zero)

6 / 24

Interpretation:

$α + β X$ : mean of $Y$ given the value of $X$
- This is the line
$β$ : increase in $Y$ associated with one unit increase in $X$
- For every 1-unit increase in $X$ , there is a $β$ change in $Y$
- This works in reverse as well: For every 1-unit decrease in $X$ , there is a $- \hat{β}$ change in $Y$
$α$ : the value of $Y$ when $X$ is zero
- Be careful! This number is not always meaningful

7 / 24

8 / 24

Women as Policy Makers

Do women promote different policies than men?
Observational studies: compare policies adopted by female politicians with those adopted by male politicians
Randomized natural experiment:
- one third of village council heads reserved for women
- assigned at the level of Gram Panchayat (GP) since mid-1990s
- each GP has multiple villages
What does the effects of female politicians mean?
Hypothesis: female politicians represent the interests of female voters
Female voters complain about drinking water while male voters complain about irrigation

9 / 24

10 / 24

Does the reservation policy increase female politicians?

Proportions of women in reserved/non-reserved GP:

Reserved	Not Reserved
1	0.075

11 / 24

Does it change the policy outcomes?

## drinking-water facilities
mean(women$water[women$reserved == 1]) -
    mean(women$water[women$reserved == 0])

## [1] 9.252423

## irrigation facilities
mean(women$irrigation[women$reserved == 1]) -
    mean(women$irrigation[women$reserved == 0])

## [1] -0.3693319

12 / 24

Slope Coefficient = Difference-in-Means Estimator

Randomization enables a causal interpretation of estimated regression coefficient $⇝$ this is not always the case

mean(women$water[women$reserved == 1]) -
    mean(women$water[women$reserved == 0])

## [1] 9.252423

lm(water ~ reserved, data = women)

## 
## Call:
## lm(formula = water ~ reserved, data = women)
## 
## Coefficients:
## (Intercept)     reserved  
##      14.738        9.252

13 / 24

Linear Regression with Multiple Predictors

The model:

$\begin{array}{rcl} Y & = & α + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{p} X_{p} + ϵ \end{array}$ Sum of squared residuals (SSR):

$\begin{array}{rcl} SSR & = & \sum_{i = 1}^{n} {\hat{ϵ}}_{i}^{2} = \sum_{i = 1}^{n} (Y_{i} - \hat{α} - {\hat{β}}_{1} X_{i 1} - {\hat{β}}_{2} X_{i 2} - \dots - {\hat{β}}_{p} X_{i p})^{2} \end{array}$

14 / 24

Multiple Regression

Most outcomes of interests $Y$ are multi-causal;
Researchers are often interested in isolating the effect of the hypothesized theoretically relevant variable $X$ ;
Use multiple regression to statistically ''control for'' other causal variables $Z$ .

15 / 24

Multiple RegressionMove from a simple regression of:
Yi=α+βXi+ϵiYi=α+βXi+ϵi
to a multiple regression of:
Yi=α+β1Xi+β2Zi+ϵi,Yi=α+β1Xi+β2Zi+ϵi,
where YiYi is the dependent variable, XiXi and ZiZi are independent variables, and ϵiϵi is the error term, for observation ii, αα is the constant, β1β1 and β2β2 are the coefficients associated with XX and ZZ, respectively.
16 / 24

Multiple Regression

For simple regression, we thought of $β$ as the steepness of the best fitting line that ran through a scatterplot;
For multiple regression, it is the same idea, but not it is multi-dimensional:
- Rather than two dimensions-- $x$ and $y$ axis--visible with a scatterplot, we are moving to three or more dimensions.

17 / 24

Multiple Regression

Example of three dimensional space:

18 / 24

Multiple Regression

Same ''for every one-unit change'' interpretation, but now controlling for (holding constant) the effect of another independent variable;
- $β_{1}$ is the effect of $X$ on $Y$ , while holding the effect of $Z$ constant;
- $β_{2}$ is the effect of $Z$ on $Y$ , while holding the effect of $X$ constant;

19 / 24

Lab20 / 24

Green, Gerber, and Larimer (2008)
Randomization of Treatments Enables Causal Interpretation

social <- read.csv("data/social.csv")
social$messages<-as.factor(social$messages)
levels(social$messages) # base level is `Civic'

## [1] "Civic Duty" "Control"    "Hawthorne"  "Neighbors"

fit <- lm(primary2008 ~ messages, data = social)
round(coef(fit),3)

##       (Intercept)   messagesControl messagesHawthorne messagesNeighbors 
##             0.315            -0.018             0.008             0.063

The baseline category, the Intercept, is Civic Duty

21 / 24

Let's make Control the baseline category

Create a binary indicator variable for each of the 4 categories

social$control<-as.numeric(social$messages=="Control")
social$civic<-as.numeric(social$messages=="Civic Duty")
social$hawthorne<-as.numeric(social$messages=="Hawthorne")
social$neighbors<-as.numeric(social$messages=="Neighbors")
fit1<-lm(primary2008 ~ civic+ hawthorne+ neighbors, data = social)
round(coef(fit1),3)

## (Intercept)       civic   hawthorne   neighbors 
##       0.297       0.018       0.026       0.081

22 / 24

Fitted Values

The predicted values give the average outcome under each condition

predict(fit, newdata =  data.frame(messages = 
                                     unique(social$messages)))

##         1         2         3         4 
## 0.3145377 0.3223746 0.2966383 0.3779482

tapply(social$primary2008, social$messages, mean)

## Civic Duty    Control  Hawthorne  Neighbors 
##  0.3145377  0.2966383  0.3223746  0.3779482

23 / 24

Your Turn

Create a new variable age that equals to 2008- the year of the respondent's birth.
Estimate the same model of the experimental treatment on turnout, but now also control for respondent's age.
Did the effect of each treatment change?
What is the effect of age on turnout?
What is the expected turnout for 18-year-olds? for 40-year-olds?

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

POL 304: Using Data to Understand Politics and Society

Linear Regression

Olga Chyzh [www.olgachyzh.com]

Today's Agenda

Which Person is the More Competent?

Which Person is the More Competent?

Facial Competence and Vote Share

Best Fit Line

Linear Regression Model

Interpretation:

Facial Competence and Vote Share

Women as Policy Makers

Does the reservation policy increase female politicians?

Does it change the policy outcomes?

Slope Coefficient = Difference-in-Means Estimator

Linear Regression with Multiple Predictors

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Multiple Regression

Lab

The Social Pressure Experiment

Fitted Values

Your Turn

Today's Agenda

Help