Reputation: 123
I'm trying to do 2 stage least squares regression in python using the statsmodels
library:
from statsmodels.sandbox.regression.gmm import IV2SLS
resultIV = IV2SLS(dietdummy['Log Income'],
dietdummy.drop(['Log Income', 'Diabetes']),
dietdummy.drop(['Log Income', 'Reads Nutri')
Reads Nutri
is my endogenous variable my instrument is Diabetes
and my dependent variable is Log Income
.
Did I do this right? It is much different than the way I would do it on Stata.
Also, when I do resultIV.summary()
, I get a TypeError
(something to do with the F statistic being nonetype). How can I resolve this?
Upvotes: 8
Views: 19993
Reputation: 904
Personally, I found the IV2SLS function in linearmodels 4.5 to be more intuitive than the statsmodels version, as it has separate parameters for the dependent variable and the endogenous variable(s), whereas the statsmodels version doesn't. The results I got from the linearmodels function lined up with what I would get with an Excel add-in I got through school.
If you choose to use the linearmodels function, this guide should also help. For instance, it showed me that I needed to add in a constant for my function to produce the correct output.
Upvotes: 1
Reputation: 457
I found this question when I wanted to do an IV2SLS regression myself and had the same problem. So, just for everybody else who landed here.
The documentation of statsmodels shows how to use this command. Your arguments are endog
, exog
, and instrument
in that order where exog
includes variables which are instrumented and instrument
the instruments and other control variables. In that sense, your model is fine.
The TypeError
you found is currently an open bug in versions 0.6.0 and 0.8.1. and will be fixed in 0.9.0 according to the milestone.
Update (28.06.2018): Version 9.0.0 was released on 15 May and should include a fix for the aforementioned bug.
Upvotes: 9