le4m
le4m

Reputation: 578

Is there no precise implementation of batch normalization in tensorflow and why?

What is precisely done by batch normalization at inference phase is to normalize each layer with a population mean and an estimated population variance enter image description here

But it seems every tensorflow implementation (including this one and the official tensorflow implementation) uses (exponential) moving average and variance.

Please forgive me, but I don't understand why. Is it because using moving average is just better for performance? Or for a pure computational speed sake?

Refercence: the original paper

Upvotes: 0

Views: 112

Answers (1)

Dmytro Danevskyi
Dmytro Danevskyi

Reputation: 3159

Exact update rule for sample mean is just an exponential averaging with a step equal to inverse sample size. So, if you know sample size, you could just set the decay factor to be 1/n, where n is sample size. However, decay factor usually does not matter if chosen to be very close to one, as exponetital averaging with such decay rate still provides very close approximation of mean and variance, especially on large datasets.

Upvotes: 0

Related Questions