Yuxxxxxx
Yuxxxxxx

Reputation: 213

How can I train a model in statsmodels?

This is a pretty straightforward question and I know some will be inclined to give a -1, but please let me explain better.

Most of statsmodels tutorials in the internet (such as this, this and this) usually create a Linear Regression without splitting the dataset into train and test. They create a linear regression using this sintax:

import statsmodels.formula.api as sm
sm.ols('y~x1+x2+x3', data=df).fit()

There is no need to say how dangerous is to build a model without a test dataset.

My question here is how can I create a linear regression with statsmodels, using train and test split?

After searching a lot, I found this approach:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
features, target, train_size=0.8, random_state=42
)

import statsmodels.api as sm

smfOLS = smf.OLS(X_train, y_train).fit()

However, I'm getting this error:

AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS'

I know I should provide a dataset, but unfortunately, I'm working with confidential data. But any dataset you have should be enough to understand the situation.

Upvotes: 0

Views: 3630

Answers (1)

anarchy
anarchy

Reputation: 5174

Try this,

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
features, target, train_size=0.8, random_state=42
)

import statsmodels.api as sm


smfOLS = sm.OLS(y_train, X_train).fit()

Upvotes: 2

Related Questions