Reputation: 286
As someone who doesn't have a strong background in statistics, could someone explain to me the main limitation(s) of batch normalization that batch renormalization aims to solve, especially in terms of how it differs from batch normalization?
Upvotes: 4
Views: 2745
Reputation: 77860
Very briefly, batch normalization simply re-scales each batch to a common mean and deviation. Each batch is scaled independently. Batch renormalization includes prior normalization parameters as part of the new computation, so that each batch is normalized to a standard common to all batches. This asymptotically approaches a global normalization, keeping off-center batches from skewing the training from the desired center.
Upvotes: 8