Reputation: 13660
I can't seem to figure out the syntax to score a logistic regression model.
logit = sm.Logit(data[response],sm.add_constant(data[features]))
model = logit.fit()
preds = model.predict(data[features])
This is the traceback I am getting (sorry for the ugly format, didn't know how to fix it...)
2 logit = sm.Logit(data[response],sm.add_constant(data[features]))
3 model = logit.fit()
----> 4 preds = model.predict(data[features])
878 exog = dmatrix(self.model.data.orig_exog.design_info.builder,
879 exog)
--> 880 return self.model.predict(self.params, exog, *args, **kwargs) 881 882
376 exog = self.exog
377 if not linear:
--> 378 return self.cdf(np.dot(exog, params)) 379 else: 380 return np.dot(exog, params)
ValueError: matrices are not aligned
Upvotes: 1
Views: 5778
Reputation: 22897
You are including the constant in the estimation but not in the prediction.
The explanatory variable use for prediction needs the same number of variables, including a constant if it was used in the estimation:
preds = model.predict(sm.add_constant(data[features]))
It is often useful to add a constant column to the data frame so we have a consistent set of variables including the constant.
Related: The formula interface does some automatic transformations also in the call to predict, if they have been used in the model.
Upvotes: 2
Reputation: 8283
It looks like you also need to add the constant to the predict method. Assuming you're working with pandas, it might be easier to do
data['constant'] = 1
And add it to your features list. Alternatively you can use the formula interface at statsmodels.formula.api.logit
Upvotes: 2