Reputation: 13
Simple example below using minmaxscaler, polyl features and linear regression classifier.
pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())
pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)
0.4433729905419167
3.4067909278765605
[ 0. -7.60868833 5.87162697]
X_trainScaled = MinMaxScaler().fit_transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)
X_testScaled = MinMaxScaler().fit_transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)
reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)
0.44099256691782807
3.4067909278765605
[ 0. -7.60868833 5.87162697]
True
[ True True True]
Upvotes: 1
Views: 986
Reputation: 5916
The problem lies in your manual steps, where you do the refitting of the Scaler using test data, you need to fit it on train data and use fitted instance on test data, see here for details: How to normalize the Train and Test data using MinMaxScaler sklearn and StandardScaler before and after splitting data
from sklearn.datasets import make_classification, make_regression
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
X, y = make_regression(n_features=3, n_samples=50, n_informative=1, noise=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, y)
pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())
pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)
scaler = MinMaxScaler().fit(X_train)
X_trainScaled = scaler.transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)
X_testScaled = scaler.transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)
reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)
Upvotes: 2