Isn't it dangerous to apply Min Max Scaling to the test set?

Question

Here's the situation I am worrying about.

Let me say I have a model trained with min-max scaled data. I want to test my model, so I also scaled the test dataset with my old scaler which was used in the training stage. However, my new test data's turned out to be the newer minimum, so the scaler returned negative value.

As far as I know, minimum and maximum aren't that stable value, especially in the volatile dataset such as cryptocurrency data. In this case, should I update my scaler? Or should I retrain my model?

Jeff · Accepted Answer

I happen to disagree with @Sharan_Sundar. The point of scaling is to bring all of your features onto a single scale, not to rigorously ensure that they lie in the interval [0,1]. This can be very important, especially when considering regularization techniques the penalize large coefficients (whether they be linear regression coefficients or neural network weights). The combination of feature scaling and regularization help to ensure your model generalizes to unobserved data.

Scaling based on your "test" data is not a great idea because in practice, as you pointed out, you can easily observe new data points that don't lie within the bounds of your original observations. Your model needs to be robust to this.

In general, I would recommend considering different scaling routines. scikitlearn's MinMaxScaler is one, as is StandardScaler (subtract mean and divide by standard deviation). In the case where your target variable, cryptocurrency price can vary over multiple orders of magnitude, it might be worth using the logarithm function for scaling some of your variables. This is where data science becomes an art -- there's not necessarily a 'right' answer here.

(EDIT) - Also see: Do you apply min max scaling separately on training and test data?

Isn't it dangerous to apply Min Max Scaling to the test set?

Answers (2)

Related Questions

Isn&#39;t it dangerous to apply Min Max Scaling to the test set?

Answers (2)

Related Questions

Isn't it dangerous to apply Min Max Scaling to the test set?