Reputation: 105
I am trying to train a RNN network which employs LSTMs.
In data preprocessing part, when I normalize(feature scaling) the dataset, I am normalizing whole database together. However, I have serious doubts if some of the input columns are dominant on others, and it can effect the network training part. Here is an example of the dataset for better understanding:
As you can see from the figure above, different colored columns are much more greater or lower than others.
So, my question is; is it okay if I normalize the whole dataset together, or should I normalize each columns individually?
Upvotes: 0
Views: 1423
Reputation: 3817
Feature scaling is done on a per column basis. The operations are applied to one feature at a time because the objective is to get the different features into similar ranges so the unit of the feature does not impact learning (source). You are right that the magnitude of features can affect training and therefore scaling is considered a best practice especially when training neural networks.
Typically this is done in one of two ways:
Rescaling can be done in Python using Scikit-Learn's MinMaxScaler
. Standardization can be done in Python using Scikit-Learn's StandardScaler
.
Here is a good article on the basics of feature scaling: http://sebastianraschka.com/Articles/2014_about_feature_scaling.html.
Upvotes: 2