Sikai Yao
Sikai Yao

Reputation: 327

How BatchNormalization in keras works?

I want to know how BatchNormalization works in keras, so I write the code:

X_input = keras.Input((2,))
X = keras.layers.BatchNormalization(axis=1)(X_input)
model1 = keras.Model(inputs=X_input, outputs=X)

the input is a batch of two dimenstions vector, and normalizing it along axis=1, then print the output:

a = np.arange(4).reshape((2,2))
print('a=')
print(a)
print('output=')
print(model1.predict(a,batch_size=2))

and the output is:

a=
array([[0, 1],
   [2, 3]])
output=
array([[ 0.        ,  0.99950039],
   [ 1.99900079,  2.9985013 ]], dtype=float32)

I can not figure out the results. As far as I know, the mean of the batch should be ([0,1] + [2,3])/2 = [1,2], the var is 1/2*(([0,1] - [1,2])^2 + ([2,3]-[1,2])^2) = [1,1]. Finally, normalizing it with (x - mean)/sqrt(var), therefore the results are [-1, -1] and [1,1], where am I wrong?

Upvotes: 2

Views: 1633

Answers (1)

YSelf
YSelf

Reputation: 2711

BatchNormalization will substract the mean, divide by the variance, apply a factor gamma and an offset beta. If these parameters would actually be the mean and variance of your batch, the result would be centered around zero with variance 1.

But they are not. The keras BatchNormalization layer stores these as weights that can be trained, called moving_mean, moving_variance, beta and gamma. They are initialized as beta=0, gamma=1, moving_mean=0 and moving_variance=1. Since you don't have any train steps, BatchNorm does not change your values.

So, why don't you get exactly your input values? Because there is another parameter epsilon (a small number), which gets added to the variance. Therefore, all values are divided by 1+epsilon and end up a little bit below their input values.

Upvotes: 2

Related Questions