Addem
Addem

Reputation: 3919

How do you get the adjusted R-squared for the test data in statsModels?

I have a dataset like

import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
data = pd.DataFrame({'a':[4,3,4,6,6,3,2], 'b':[12,14,11,15,14,15,10]}
test = data.iloc[:4]
train = data.iloc[4:]

and I built the linear model for the train data

model = smf.ols("a ~ b", data = data)
print(model.fit().summary())

Now what I want to do is get the adjusted R^2 value based on the test data. Is there a simple command for this? I've been trying to build it from scratch and keep getting an error.

What I've been trying:

model.predict(test.b)

but it complains about the shape. Based on this: https://www.statsmodels.org/stable/examples/notebooks/generated/predict.html

I tried the following

X = sm.add_constant(test.b)
model.predict(X)

Now the error is

ValueError: shapes (200,2) and (200,2) not aligned: 2 (dim 1) != 200 (dim 0)

The shape matches but then there's this thing I don't understand about the "dim". But I thought I matched as well as I could the example in the link so I'm just not sure what's up.

Upvotes: 4

Views: 10343

Answers (1)

AlexK
AlexK

Reputation: 3011

You should first run the .fit() method and save the returned object and then run the .predict() method on that object.

results = model.fit()

Running results.params will produce this pandas Series:

Intercept   -0.875
b            0.375
dtype: float64

Then, running results.predict(test.b) will produce this Series:

0    3.625
1    4.375
2    3.250
3    4.750
dtype: float64

You can also retrieve model fit summary values by calling individual attributes of the results class (https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLSResults.html):

>>> results.rsquared_adj
0.08928571428571419

But those will be for the full/train model, so yes, you will probably need to manually compute SSR/SST/SSE values from your test predictions and true values, and get the adjusted R-squared from that.

Upvotes: 5

Related Questions