Normalizing test set but higher range

I'm normalizing and rescaling my training set with:

# zero mean
feat = (feat - feat.mean()) / feat.std()

# scale between -1, 1
feat = ((feat - feat.min()) / (feat.max() - feat.min())) * 2 - 1

This works great. I transform the test set in the exact same way, using the mean, STD, min, max from the training set. This works fine if the mean and max in the test set are the same as the training set. However, if the range of the untransformed feature in the test set is different, then I'll have values beyond -1, 1 after rescaling. How can this be addressed?

Upvotes: 1

Answers (2)

dgumo

Reputation: 1878

If a large proportion of your test inputs are coming in with values higher or lower than the extremes that you used to train the model, then you should ideally retrain your model, since your train and test distributions are different.

For unusual (outlier) like test instances, you could clip the values to be between train max/min for minmax scaling.

In case of normalizing, your test can be any value, you would just get a large z-score for extremes.

Upvotes: 1

yasser93

Reputation: 63

I think the only way is to normalize your data with the min and max of all data (training and testing set toghether).

Upvotes: 0

Normalizing test set but higher range

Answers (2)

Related Questions