Reputation: 161
I am not completely familiar with batch normalization layers,. As I understand it, it is going to compute normalization at training time using mini-batch statistics.
Do any of you have experience using thesen layers when the minibatch size is very small (for example using 2 or 4 images per iteration for the minibatch size) ? Is there any reason for it not to work efficiently ?
My feeling would be that the statistics is computed on a very small sample at training time, and could negativaly affect the training, what do you think ?
Upvotes: 2
Views: 2814
Reputation: 681
You are right in your intuition that the samples might be different from the population (mini-batch vs all samples), but this problem was addressed in the batch normalization paper. Specifically, during train time, you find the variance of your samples by dividing with the batch size (N), but during test time you account for this by using the unbiased variance estimate (multiplication by N/(N-1)): Have a look here for a more detailed and easy to understand explanation: Batch Normalization
Upvotes: 1