Bharathwajan
Bharathwajan

Reputation: 53

strange behavior of 'inverse_transform' function in sklearn.preprocessing.MinMaxScalar

I used MinMaxScalar function in sklearn.preprocessing for normalizing the attributes of some of my variables(array) to use that in a model(linear regression), after the model creation and training I tested my model with x_test(splited usind train_test_split) and stored the result in some variable(say predicted) ,for evaluating purpose i wanna evaluate my prediction with the original dataset for that i used "MinMaxScalar.inverse_transform" function, that function works well when my code is in below order,

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,train_size=0.75,random_state=27)

sc=MinMaxScaler(feature_range=(0,1))
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_train)
y_train=y_train.reshape(-1,1)
y_train=sc.fit_transform(y_train)

when i changed the order like the below code it throws me error on-broadcastable output operand with shape (379,1) doesn't match the broadcast shape (379,13))

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,train_size=0.75,random_state=27)

sc=MinMaxScaler(feature_range=(0,1))
x_train=sc.fit_transform(x_train)
y_train=y_train.reshape(-1,1)
y_train=sc.fit_transform(y_train)
x_test=sc.fit_transform(x_train)

please compare the two photos for better understanding of my query:

please compare the two photos for better understanding of my query

Upvotes: 0

Views: 1111

Answers (1)

Niko Fohr
Niko Fohr

Reputation: 34008

It can be seen from the linked printscreen figure that you use the same MinMaxScaler to fit and transform both the train and test x-data, and also the training y-data (which does not make sense).

The correct process would be

  1. Fit the scaler with train x-data. The fit_transform() also transforms (scales) the x_train.
sc = MinMaxScaler(feature_range=(0,1))
x_train = sc.fit_transform(x_train)
  1. Scale also the test x-data with the same scaler. Do not fit here; just scale/transform.
x_test = sc.transform(x_test)
  1. If you think scaling is needed also for y-data, you will have to fit another scaler for that purpose. It could also be that there is no need for scaling the y-data.
# Option A: Do not scale y-data
# (do nothing)

# Option B: Scale y-data
sc_y = MinMaxScaler(feature_range=(0,1))
y_train = sc_y.fit_transform(y_train)
  1. After you have trained your model (lr), you can make predictions with the scaled x_test and the model:
# Option A:
predicted = lr.predict(x_test)

# Option B:
y_test_scaled = lr.predict(x_test)
predicted = sc_y.inverse_transform(y_test_scaled)

Upvotes: 1

Related Questions