Reputation: 3561
I am using MLPRegressor which takes 5 continuous features and 1 feature which draws values from a set of 40 values [0,1,2,.., 39]
.
I was told that normalizing the features using sklearn.preprocessing.MinMaxScaler(feature_range = (0,1))
can help with performance, both with MLP and LSTMs.
Thus I am using it on my Xtrain
matrix containing the features above.
However, it looks weird to me that I should be minimizing also a categorical variable.. should I do it? The documentation says that (http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) MInMaxScaler normalizes each feature separately. Should I take away the categorical column and normalize all the others?
Also, if it normalizes each feature separately, how does it know how to transform them back when I use inverse_transform
?
Upvotes: 3
Views: 7270
Reputation: 896
Categorical feature should be presented as OneHotEncoding. Still if you perform normalization of categorical feature, it will not harm your data. It just convert your data from one form to another form and keep the value discreteness. Please find below small code example:
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
data = np.array([-2,-2,-78,-78,-1,-1,0,0,1,1])
scaler = MinMaxScaler(feature_range=(0,1))
normalizedData = scaler.fit_transform(data.reshape(-1,1))
encoder = OneHotEncoder(categories='auto',sparse=False)
encodedData = encoder.fit_transform(normalizedData.reshape(-1,1))
print(encodedData)
O/P after OneHotEncoding:
[[0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[1. 0. 0. 0. 0.]
[1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1.]]
And O/P will remain same even in case if I directly feed the data to encoder i.e. without normalizing.
Upvotes: 3
Reputation: 387
Scaling categorical variables is unnecessary since there is no natural sense of Metric in these type of variables space.
The second answer - the MinMaxScaler object keeps scale_, data_range_, data_min_ data_max_
after fitted to data(arrays in length of normalized variable).
This attributes enable the inverse transformation per each feature.
Upvotes: 0
Reputation: 146
Categorical variables should be handled accordingly, i.e. with one-hot encoding
After that the MinMax scaler would not really change the encoded features.
Answering your last question - the scaler simply stores minima ans maxima for each input feacture separatley, so it can make inverse transform. And it makes sense to scale features independently - they may be of different scale AND even nature.
Upvotes: 2