Reputation: 213
This is a pretty straightforward question and I know some will be inclined to give a -1, but please let me explain better.
Most of statsmodels tutorials in the internet (such as this, this and this) usually create a Linear Regression without splitting the dataset into train and test. They create a linear regression using this sintax:
import statsmodels.formula.api as sm
sm.ols('y~x1+x2+x3', data=df).fit()
There is no need to say how dangerous is to build a model without a test dataset.
My question here is how can I create a linear regression with statsmodels, using train and test split?
After searching a lot, I found this approach:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
features, target, train_size=0.8, random_state=42
)
import statsmodels.api as sm
smfOLS = smf.OLS(X_train, y_train).fit()
However, I'm getting this error:
AttributeError: module 'statsmodels.formula.api' has no attribute 'OLS'
I know I should provide a dataset, but unfortunately, I'm working with confidential data. But any dataset you have should be enough to understand the situation.
Upvotes: 0
Views: 3630
Reputation: 5174
Try this,
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
features, target, train_size=0.8, random_state=42
)
import statsmodels.api as sm
smfOLS = sm.OLS(y_train, X_train).fit()
Upvotes: 2