Reputation: 578
What is precisely done by batch normalization at inference phase is to normalize each layer with a population mean and an estimated population variance
But it seems every tensorflow implementation (including this one and the official tensorflow implementation) uses (exponential) moving average and variance.
Please forgive me, but I don't understand why. Is it because using moving average is just better for performance? Or for a pure computational speed sake?
Refercence: the original paper
Upvotes: 0
Views: 112
Reputation: 3159
Exact update rule for sample mean is just an exponential averaging with a step equal to inverse sample size. So, if you know sample size, you could just set the decay factor to be 1/n
, where n
is sample size. However, decay factor usually does not matter if chosen to be very close to one, as exponetital averaging with such decay rate still provides very close approximation of mean and variance, especially on large datasets.
Upvotes: 0