Reputation: 69
I am attempting to run the following IV regression on an unbalanced panel dataset. The variables TOTAL_E, TOURISM_5k_SUM and TOURISM_10K_SUM are endogenous, while HHSIZE2 and SEX are exogenous explanatory variables, and Z_b_100,tourism_5KM_ZSCORE and tourism_10KM_ZSCORE are instruments. I want to include household characteristics as controls in this regression. I do so and am also sure to include them in the formula with the IVs, but every time I still get the insufficient number of instruments error. I am using a random effects model since my instrument is time invariant so I don't believe that is the issue.
Based on this question:Error "insufficient number of instruments" when running plm IV regression, I am supposed to have 2 instruments for every endogenous variable but that isn't possible in my case so I am a bit stuck.
random <- plm(asinh(AECAPITA)~asinh(TOTAL_E) +asinh(TOURISM_5KM_SUM)+asinh(TOURISM_10KM_SUM) + HHSIZE2 + SEX|Z_b_100 + HHSIZE2 + tourism_5KM_ZSCORE + tourism_10KM_ZSCORE+ SEX ,data=df,index=c("UNIQUE_HH_ID","YEAR"), model="random")
Error in plm.fit(data, model = models[1L], effect = effect) :
insufficient number of instruments
In addition: Warning message:
In pdata.frame(data, index) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")
Upvotes: 0
Views: 158
Reputation: 3687
I suggest you first turn to the warning as this hints your data set is not correctly specified ("duplicate couples (id-time) in resulting pdata.frame"). There are numerous questions and answers about this on SO, e.g., https://stackoverflow.com/a/72092725/4640346. Basically, you need to have unique pairs of the dimension of the observational unit and the time dimension. The warning also contains a hint how to check for this.
I suggest to first create the pdata.frame
and then input it in plm()
without the index
argument. This way, you will know if you specified the data set correctly first (it somewhat corresponds to Stata's xtset
).
When the data is correctly specified, try to re-estimate the model with it and see if you still get the error.
Overall approach would be something along these lines:
pdf <- pdata.frame(df, index=c("UNIQUE_HH_ID","YEAR"))
mod <- plm(formula, data = pdf, <further_specs>)
Upvotes: 1