Reputation: 53
I used MinMaxScalar function in sklearn.preprocessing for normalizing the attributes of some of my variables(array) to use that in a model(linear regression), after the model creation and training
I tested my model with x_test(splited usind train_test_split)
and stored the result in some variable(say predicted)
,for evaluating purpose i wanna evaluate my prediction with the original dataset for that i used "MinMaxScalar.inverse_transform
" function, that function works well when my code is in below order,
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,train_size=0.75,random_state=27)
sc=MinMaxScaler(feature_range=(0,1))
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_train)
y_train=y_train.reshape(-1,1)
y_train=sc.fit_transform(y_train)
when i changed the order like the below code it throws me error on-broadcastable output operand with shape (379,1) doesn't match the broadcast shape (379,13))
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,train_size=0.75,random_state=27)
sc=MinMaxScaler(feature_range=(0,1))
x_train=sc.fit_transform(x_train)
y_train=y_train.reshape(-1,1)
y_train=sc.fit_transform(y_train)
x_test=sc.fit_transform(x_train)
please compare the two photos for better understanding of my query:
Upvotes: 0
Views: 1111
Reputation: 34008
It can be seen from the linked printscreen figure that you use the same MinMaxScaler
to fit and transform both the train and test x-data, and also the training y-data (which does not make sense).
The correct process would be
fit_transform()
also transforms (scales) the x_train
.sc = MinMaxScaler(feature_range=(0,1))
x_train = sc.fit_transform(x_train)
fit
here; just scale/transform.x_test = sc.transform(x_test)
# Option A: Do not scale y-data
# (do nothing)
# Option B: Scale y-data
sc_y = MinMaxScaler(feature_range=(0,1))
y_train = sc_y.fit_transform(y_train)
lr
), you can make predictions with the scaled x_test
and the model:# Option A:
predicted = lr.predict(x_test)
# Option B:
y_test_scaled = lr.predict(x_test)
predicted = sc_y.inverse_transform(y_test_scaled)
Upvotes: 1