Overfitting with batch normalization [tensorflow]?

Question

I have a mid-sized conv net, neatly souped-up with batch normalization. The effect of batch normalization is tremendously positive [more than 10x training speed up and much improved accuracy].

However, there is a significant increase in the accuracy gap between training and validation/test sets, approaching 10%. This is disturbing. The gap slowly builds up during training.

The BN implementation uses the standard TF Exponential Moving Average. This does not seem to be an issue, as both validation and training sets share the same statistics and I also tried to "cold warm-up" the moving averages towards the test set statistics; this procedure had no effect. Also, I had to turn off both l2 regularization and dropout for BN to work nicely.

Has anyone encountered similar things? Any ideas? Are there suggestions as to how to add "more standard" regularizations to a BN-network?

Overfitting with batch normalization [tensorflow]?

Answers (1)

Related Questions