Reputation: 51259
I am training LSTM network over time series data and would like to normalize data, because my features are of different scale.
My data shape is
(n_samples x n_timestamps x n_features)
I would like to use BatchNormalization
layer.
Should I set axis
to 2 (features, as stated in docs) or 1 (timestamps)? I would like my features to go into [0..1] range, while they are of very different scale.
The problem is documentation doesn't say what does this layer actually do, but instead it gives recommendations for CNN.
Upvotes: 3
Views: 1592
Reputation: 86650
Usually, you'd use the features dimension: -1.
It will treat each feature individually and normalize based on every other dimension.
But it will not make them go into the range 0 to 1. It will use (x - mean)/variance
and apply a scale factor and bias after the normalization.
For, instance. Take feature 0:
Repeat the same for feature 1, with another mean, another variance, scale and bias.
If you use the timesteps dimension, it will see each step individually and give one scale factor for each step, which would not make much sense as steps should all have similar nature, differently from features which can mean completely different things.
If you do need things between 0 and 1, you can simply apply an Activatoin('sigmoid')
. If you fear that your values will be too saturated, you can apply a BatchNormalization()
then an Activatoin('sigmoid')
.
Upvotes: 3