Is there no precise implementation of batch normalization in tensorflow and why?

Question

What is precisely done by batch normalization at inference phase is to normalize each layer with a population mean and an estimated population variance

But it seems every tensorflow implementation (including this one and the official tensorflow implementation) uses (exponential) moving average and variance.

Please forgive me, but I don't understand why. Is it because using moving average is just better for performance? Or for a pure computational speed sake?

Refercence: the original paper

Is there no precise implementation of batch normalization in tensorflow and why?

Answers (1)

Related Questions