Chris
Chris

Reputation: 13660

Score Statsmodels Logit

I can't seem to figure out the syntax to score a logistic regression model.

logit = sm.Logit(data[response],sm.add_constant(data[features]))
model = logit.fit()
preds = model.predict(data[features])

This is the traceback I am getting (sorry for the ugly format, didn't know how to fix it...)


  2     logit = sm.Logit(data[response],sm.add_constant(data[features]))
  3     model = logit.fit()

----> 4 preds = model.predict(data[features])

878             exog = dmatrix(self.model.data.orig_exog.design_info.builder,
879                     exog)

--> 880 return self.model.predict(self.params, exog, *args, **kwargs) 881 882

376             exog = self.exog
377         if not linear:

--> 378 return self.cdf(np.dot(exog, params)) 379 else: 380 return np.dot(exog, params)

ValueError: matrices are not aligned

Upvotes: 1

Views: 5778

Answers (2)

Josef
Josef

Reputation: 22897

You are including the constant in the estimation but not in the prediction.

The explanatory variable use for prediction needs the same number of variables, including a constant if it was used in the estimation:

preds = model.predict(sm.add_constant(data[features]))

It is often useful to add a constant column to the data frame so we have a consistent set of variables including the constant.

Related: The formula interface does some automatic transformations also in the call to predict, if they have been used in the model.

Upvotes: 2

jseabold
jseabold

Reputation: 8283

It looks like you also need to add the constant to the predict method. Assuming you're working with pandas, it might be easier to do

data['constant'] = 1

And add it to your features list. Alternatively you can use the formula interface at statsmodels.formula.api.logit

Upvotes: 2

Related Questions