zonzon510
zonzon510

Reputation: 163

Can someone explain the behaviour of tf.keras.layers.BatchNormalization?

from the tensorflow documentation:

https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization

"Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1."

therefore, I expect that this layer should first calculate the mean and standard deviation of the previous layer output, subtract it by the mean, and divide by the standard deviation for each sample in the batch. But apparently I'm wrong.

import numpy as np
import tensorflow as tf


if __name__ == "__main__":
    # flattened tensor, batch size of 2
    xnp = np.array([[1,2,3],[4,5,6]])
    xtens = tf.constant(xnp,dtype=tf.float32)

    nbatchnorm = tf.keras.layers.BatchNormalization()(xtens)

    # tensorflow output
    print(nbatchnorm)

    # what I expect to see
    xmean = np.mean(xnp,axis=1)
    xstd = np.std(xnp,axis=1)
    # set the mean to 0 and the standard deviation to 1 for each sample
    normalized = (xnp - xmean.reshape(-1,1)) / xstd.reshape(-1,1)

    print(normalized)

output:

tf.Tensor(
[[0.9995004 1.9990008 2.9985013]                                                                                                     
 [3.9980016 4.997502  5.9970026]], shape=(2, 3), dtype=float32)                                                 

[[-1.22474487  0.          1.22474487]           
 [-1.22474487  0.          1.22474487]]   

Can someone please explain to me why these outputs are not the same or atleast similar? I dont see how this is normalizing anything.

Upvotes: 2

Views: 741

Answers (1)

user11530462
user11530462

Reputation:

Well, Batch Normalization depends on numerous factors on its algorithm which is explained below.

enter image description here

  • μB is the vector of input means, evaluated over the whole mini- batch B (it contains one mean per input).
  • σB is the vector of input standard deviations, also evaluated over the whole mini-batch (it contains one standard deviation per input).
  • mB is the number of instances in the mini-batch.
  • (i) is the vector of zero-centered and normalized inputs for instance i.
  • γ is the output scale parameter vector for the layer (it contains one scale parameter per input).
  • represents element-wise multiplication (each input is multiplied by its corresponding output scale parameter).
  • β is the output shift (offset) parameter vector for the layer (it contains one offset parameter per input). Each input is offset by its corresponding shift parameter.
  • ε is a tiny number that avoids division by zero (typically 10–5 ). This is called a smoothing term.
  • z(i) is the output of the BN operation. It is a rescaled and shifted version of the inputs.

Upvotes: 1

Related Questions