Reputation: 107
Normalization is just normalizing the input layer. while batch normalization is on each layer.
We do not learn parameters in Normalization But why we need to learn the batch normalization?
Upvotes: 1
Views: 395
Reputation: 2719
This is has been answered in detail in https://stats.stackexchange.com/a/310761
Deep Learning Book, Section 8.7.1:
Normalizing the mean and standard deviation of a unit can reduce the expressive power of the neural network containing that unit. To maintain the expressive power of the network, it is common to replace the batch of hidden unit activations H with γH+β rather than simply the normalized H. The variables γ and β are learned parameters that allow the new variable to have any mean and standard deviation. At first glance, this may seem useless — why did we set the mean to 0, and then introduce a parameter that allows it to be set back to any arbitrary value β?
The answer is that the new parametrization can represent the same family of functions of the input as the old parametrization, but the new parametrization has different learning dynamics. In the old parametrization, the mean of H was determined by a complicated interaction between the parameters in the layers below H. In the new parametrization, the mean of γH+β is determined solely by β. The new parametrization is much easier to learn with gradient descent.
Upvotes: 2