Batch normalization and small mini-batch

Question

I am not completely familiar with batch normalization layers,. As I understand it, it is going to compute normalization at training time using mini-batch statistics.

Do any of you have experience using thesen layers when the minibatch size is very small (for example using 2 or 4 images per iteration for the minibatch size) ? Is there any reason for it not to work efficiently ?

My feeling would be that the statistics is computed on a very small sample at training time, and could negativaly affect the training, what do you think ?

Batch normalization and small mini-batch

Answers (1)

Related Questions