Reputation: 59
I have a mid-sized conv net, neatly souped-up with batch normalization. The effect of batch normalization is tremendously positive [more than 10x training speed up and much improved accuracy].
However, there is a significant increase in the accuracy gap between training and validation/test sets, approaching 10%. This is disturbing. The gap slowly builds up during training.
The BN implementation uses the standard TF Exponential Moving Average. This does not seem to be an issue, as both validation and training sets share the same statistics and I also tried to "cold warm-up" the moving averages towards the test set statistics; this procedure had no effect. Also, I had to turn off both l2 regularization and dropout for BN to work nicely.
Has anyone encountered similar things? Any ideas? Are there suggestions as to how to add "more standard" regularizations to a BN-network?
Upvotes: 1
Views: 6238
Reputation: 86
Batch normalization seems to be overfitting because of improper calculation of running mean and variance. This may happy if the last batch in your run over the dataset is much smaller than the rest of the batches, causing the error to accumulate over multiple epochs.
Make sure your last batch is the same size as the rest of the batches, probably by just ignoring the last batch.
Upvotes: 0