Reputation: 37
I'm working on image super-resolution tasks with EDSR as a baseline model. Following EDSR, I'm not using any batch-norm layers in my model. I suddenly came up with a stupid question about batch-sizes.
Currently, I'm training my model with batch-size=32 (as in EDSR). But since I'm not using any batch-normalization technique, I cant see any reason for using batch sizes greater than 1. But I'm not confident with my thoughts since the author's implementations are using batch sizes greater than 1.
Could someone help me with this? What am I missing?
Upvotes: 1
Views: 903
Reputation: 1656
In Rethinking “Batch” in BatchNorm research carried out by FAIR, batch normalization and batch size are discussed. According to the graph below, you can see the relation of batch normalization and batch size. It shows that when you use smaller batch size you do not need to use batch normalization. Batch normalization is helpful when you have bigger batch size. Using smaller batch size with batch normalization leads to training/testing inconsistency.
Classification error under different normalization batch sizes, with a fixed total batch size of 1024. Green: error rate on unaugmented training set using mini-batch statistics; Red: error rate on validation set using population statistics estimated by PreciseBN; Blue: error rate on validation set using mini-batch statistics of random batches (with the same normalization batch size used in training). The gap between red and blue curves is caused by train-test inconsistency, while the gap between blue and green curves is the generalization gap on unseen dataset.
Upvotes: 1