Reputation: 47
I have individual-level data to analyze the effect of state-level educational expenditures on individual-level students' performances. Students' performance is a binary variable (0 when they do not pass, 1 when they pass the test). I run the following glm model with state-level clustering of standard errors:
library(miceadds)
df_logit <- data.frame(performance = c(0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0),
state = c("MA", "MA", "MB", "MC", "MB", "MD", "MA", "MC", "MB", "MD", "MB", "MC", "MA", "MA", "MA", "MA", "MD", "MA","MB","MA","MA","MD","MC","MA","MA","MC","MB","MB","MD", "MB"),
expenditure = c(123000, 123000,654000, 785000, 654000, 468000, 123000, 785000, 654000, 468000, 654000, 785000,123000,123000,123000,123000, 468000,123000, 654000, 123000, 123000, 468000,785000,123000, 123000, 785000, 654000, 654000, 468000,654000),
population = c(0.25, 0.25, 0.12, 0.45, 0.12, 0.31, 0.25, 0.45, 0.12, 0.31, 0.12, 0.45, 0.25, 0.25, 0.25, 0.25, 0.31, 0.25, 0.12, 0.25, 0.25, 0.31, 0.45, 0.25, 0.25, 0.45, 0.12, 0.12, 0.31, 0.1),
left_wing = c(0.10, 0.10, 0.12, 0.18, 0.12, 0.36, 0.10, 0.18, 0.12, 0.36, 0.12, 0.18, 0.10, 0.10, 0.10, 0.10, 0.36, 0.10, 0.12, 0.10, 0.10, 0.36, 0.18, 0.10, 0.10,0.18, 0.12, 0.12, 0.36, 0.12))
df_logit$performance <- as.factor(df_logit$performance)
glm_clust_1 <- miceadds::glm.cluster(data=df_logit, formula=performance ~ expenditure + population,
cluster="state", family=binomial(link = "logit"))
summary(glm_clust_1)
Since I cannot rule out the possibility that expenditures are endogenous, I would like to use the share of left-wing parties at the state level as an instrument for education expenditures.
However, I have not found a command in ivtools or other packages to run two-stage least squares with control variables in a logistic regression with state-level clustered standard errors.
Which commands can I use to extend my logit model with the instrument "left_wing" (also included in the example dataset) and at the same time output the common tests like the Wu-Hausman test or the weak instrument test (like ivreg does for ols)?
ideally, I could adapt the following command to binary dependent variables and cluster the standard errors at state level
iv_1 <- ivreg(performance ~ population + expenditure | left_wing + population, data=df_logit)
summary(iv_1, cluster="state", diagnostics = TRUE)
Upvotes: 1
Views: 618
Reputation: 1036
Try this?
require(mlogit)
require(ivprobit)
test <- ivprobit(performance ~ population | expenditure | left_wing + population, data = df_logit)
summary(test)
I wasn't completely sure about the clustering part, but according to this thread on CrossValidated, it might not be necessary. Please take a read and let me know what you think.
Essentially, what I understood was since the likelihood of binary data is already specified there is no need to include the clusters. This is only true when your model is "correct", however, if you believe that there is something in the joint distribution that is not accounted for then you should cluster, though from my reading it doesn't seem like it's possible to implement clustering on a IV logit model in R.
In terms of the model itself there is a really good explanation in this SO question. How can I use the "ivprobit" function in "ivprobit" package in R?.
From my reading as well there should be almost no difference between the end results of a logit
v probit
model.
The basic breakdown is as follows:
y= d2 = dichotomous l.h.s.
x= ltass+roe+div = the r.h.s. exogenous variables
y1= eqrat+bonus = the r.h.s. endogenous variables
x2= tass+roe+div+gap+cfa = the complete set of instruments
Feel free to comment/edit/give feedback to this answer as I'm definitely not expert in applications of causal analysis and it's been a long time since I've implemented one. I also have not explored the potential of post-hoc tests from this final model, so that is still left for completion.
Upvotes: 1