Reputation: 325
I am trying to finish up a homework assignment and to do so I need to use categorical variables in statsmodels (due to a refusal to conform to using stata like everyone else). I have spent some time reading through documentation for both Patsy and Statsmodels and I can't quite figure out why this snippet of code isn't working. I have tried breaking them down and creating it with the patsy commands, but come up with the same error.
I currently have:
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm
# This is where I'm getting data
data = pd.read_csv("http://people.stern.nyu.edu/wgreene/Econometrics/bankdata.csv")
# I want to use this form for my regression
form = "C ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"
# Do the regression
mod = sm.ols(form, data=data)
reg = mod.fit()
print(reg.summary2())
This code raises an error that says: TypeError: 'Series' object is not callable
. There is a very similar example here on the statsmodels website which seems to work fine and I'm not sure what the difference between what I'm doing and what they're doing is.
Any help is very much appreciated.
Cheers
Upvotes: 3
Views: 6326
Reputation: 9676
The problem is that C
is the name of one of the columns in your DataFrame as well as the patsy way of denoting that you want a categorical variable. The easiest fix would be to just rename the column as such:
data = data.rename_axis({'C': 'C_data'}, axis=1)
form = "C_data ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"
Then the call to sm.ols
will just work.
The error message TypeError: 'Series' object is not callable
can be interpreted as follows:
C
as the column of the data frame. In this case it would the Series data['C']
data['C']
as a function with the argument BANK
. Series objects don't implement a __call__
method, hence the error message that the 'Series' object is not callable
.Good luck!
Upvotes: 6