Reputation: 537
I tried batch normalization for toy set [[1,2],[5,4]. Normalizaing among axis=0, we get
#[[-1/sqrt(2),-1/sqrt(2)],[1/sqrt(2), 1/sqrt(2)]]
However, my layer(axis=0) and layer(axis=1) both give incorrect result.
X = tf.constant([[1,2],[5,4]],dtype = tf.float32)
layer = keras.layers.BatchNormalization()
hidden = layer(X)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer(axis=0))
print(sess.run(layer.trainable_weights))
print(sess.run(hidden))
#results
#[array([1., 1.], dtype=float32), array([0., 0.], dtype=float32)]
#[[0.9995004 4.997502 ]
# [1.9990008 3.9980016]]
X = tf.constant([[1,2],[5,4]],dtype = tf.float32)
layer = keras.layers.BatchNormalization()
hidden = layer(X)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer(axis=1))
print(sess.run(layer.trainable_weights))
print(sess.run(hidden))
#results
#[array([1., 1.], dtype=float32), array([0., 0.], dtype=float32)]
#[[0.9995004 4.997502 ]
# [1.9990008 3.9980016]]
gamma=1 and beta=0 as trainable_weights shows. Then how this layer works?
Upvotes: 0
Views: 288
Reputation: 59294
This is only a toy model with no neurons. There is no optimization going on here. Batch normalization won't change your X
variable because by definition it is a constant.
What it does is: in the process of training a neural network, it transforms your outputs from some layer into normalized inputs to the next layer such that it helps to train the next layer's weights. I am not a kerns user, but I'd guess you might be able to check the normalized outputs of some layer only by inspecting the tensorflow nodes directly (if then)
To answer the title of your question, Batch Normalization in itself is just standard z-score normalization. It is the same as subtracting the mean and dividing by the standard deviation of the series.
In mathematical notation,
In code, where arr
is a numpy array,
(arr - arr.mean(axis=0))/arr.std(axis=0, ddof=1)
The idea of normalizing is to get your distribution closer to a standard normal with mean 0 and standard deviation 1 i.e. ~ N(0,1).
It has been discussed lately (e.g. here and here) that by renormalizing your batches you can train your Neural Networks faster by reducing internal covariate shift.
Upvotes: 2