Reputation: 10926
Given the input values [1, 5]
and normalizing them, should yield something like [-1, 1]
if I understand correctly, because
mean = 3
var = 4
result = (x - mean) / sqrt(var)
However this minimal example
import numpy as np
import keras
from keras.models import Model
from keras.layers import Input
from keras.layers.normalization import BatchNormalization
from keras import backend as K
shape = (1,2,1)
input = Input(shape=shape)
x = BatchNormalization(center=False)(input) # no beta
model = Model(inputs=input, outputs=x)
model.compile(loss='mse', optimizer='sgd')
# training with dummy data
training_in = [np.random.random(size=(10, *shape))]
training_out = [np.random.random(size=(10, *shape))]
model.fit(training_in, training_out, epochs=10)
data_in = np.array([[[[1], [5]]]], dtype=np.float32)
data_out = model.predict(data_in)
print('gamma :', K.eval(model.layers[1].gamma))
#print('beta :', K.eval(model.layers[1].beta))
print('moving_mean:', K.eval(model.layers[1].moving_mean))
print('moving_variance:', K.eval(model.layers[1].moving_variance))
print('epsilon :', model.layers[1].epsilon)
print('data_in :', data_in)
print('data_out:', data_out)
produces the following output:
gamma : [ 0.80644524]
moving_mean: [ 0.05885344]
moving_variance: [ 0.91000736]
epsilon : 0.001
data_in : [[[[ 1.]
[ 5.]]]]
data_out: [[[[ 0.79519051]
[ 4.17485714]]]]
So it is [0.79519051, 4.17485714]
instead of [-1, 1]
.
I had a look at the source, and the values seem to be forwarded to tf.nn.batch_normalization. And this looks like the result should be what I except, but obviously it is not.
So how are the output values calculated?
Upvotes: 2
Views: 1748
Reputation: 10926
The correct formula is this:
result = gamma * (input - moving_mean) / sqrt(moving_variance + epsilon) + beta
And here a script for verification:
import math
import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.models import Model
from keras.layers import Input
from keras.layers.normalization import BatchNormalization
np.random.seed(0)
print('=== keras model ===')
input_shape = (1,2,1)
input = Input(shape=input_shape)
x = BatchNormalization()(input)
model = Model(inputs=input, outputs=x)
model.compile(loss='mse', optimizer='sgd')
training_in = [np.random.random(size=(10, *input_shape))]
training_out = [np.random.random(size=(10, *input_shape))]
model.fit(training_in, training_out, epochs=100, verbose=0)
data_in = [[[1.0], [5.0]]]
data_model = np.array([data_in])
result = model.predict(data_model)
gamma = K.eval(model.layers[1].gamma)
beta = K.eval(model.layers[1].beta)
moving_mean = K.eval(model.layers[1].moving_mean)
moving_variance = K.eval(model.layers[1].moving_variance)
epsilon = model.layers[1].epsilon
print('gamma: ', gamma)
print('beta: ', beta)
print('moving_mean: ', moving_mean)
print('moving_variance:', moving_variance)
print('epsilon: ', epsilon)
print('data_in: ', data_in)
print('result: ', result)
print('=== numpy ===')
np_data = [data_in[0][0][0], data_in[0][1][0]]
np_mean = moving_mean[0]
np_variance = moving_variance[0]
np_offset = beta[0]
np_scale = gamma[0]
np_result = [np_scale * (x - np_mean) / math.sqrt(np_variance + epsilon) + np_offset for x in np_data]
print(np_result)
print('=== tensorflow ===')
tf_data = tf.constant(data_in)
tf_mean = tf.constant(moving_mean)
tf_variance = tf.constant(moving_variance)
tf_offset = tf.constant(beta)
tf_scale = tf.constant(gamma)
tf_variance_epsilon = epsilon
tf_result = tf.nn.batch_normalization(tf_data, tf_mean, tf_variance, tf_offset, tf_scale, tf_variance_epsilon)
tf_sess = tf.Session()
print(tf_sess.run(tf_result))
print('=== keras backend ===')
k_data = K.constant(data_in)
k_mean = K.constant(moving_mean)
k_variance = K.constant(moving_variance)
k_offset = K.constant(beta)
k_scale = K.constant(gamma)
k_variance_epsilon = epsilon
k_result = K.batch_normalization(k_data, k_mean, k_variance, k_offset, k_scale, k_variance_epsilon)
print(K.eval(k_result))
Output:
gamma: [ 0.22297101]
beta: [ 0.49253803]
moving_mean: [ 0.36868709]
moving_variance: [ 0.41429576]
epsilon: 0.001
data_in: [[[1.0], [5.0]]]
result: [[[[ 0.71096909]
[ 2.09494853]]]]
=== numpy ===
[0.71096905498374263, 2.0949484904433255]
=== tensorflow ===
[[[ 0.71096909]
[ 2.09494853]]]
=== keras backend ===
[[[ 0.71096909]
[ 2.09494853]]]
Upvotes: 0
Reputation: 6220
If you're using gamma
, the right equation is actually result = gamma * (x - mean) / sqrt(var)
for batch normalization, BUT mean
and var
are not always the same:
During training (fit), they are mean_batch
and var_batch
calculated using the input values of the batch (they are just the mean and variance of your batch)), just as you're doing. In the meanwhile, a global moving_mean
and moving_variance
are learnt this way: moving_mean = alpha * moving_mean + (1-alpha) * mean_batch
, with alpha is a kind of learning rate, in (0,1), usually above 0.9. moving_mean
and moving_variance
are approximations of the real mean and variance of all your training data. Gamma
is also learnt, by usual gradient descent, to best fit your output.
During inference (predict), you just use the learnt values of moving_mean
and moving_variance
, not at all mean_batch
and var_batch
. You also use the learnt gamma
.
So 0.05885344
is just an approximation of the mean of your random input data, 0.91000736
of its variance, and you're using these to normalize your new data [1, 5]. You can easily chack check that [0.79519051, 4.17485714]=gamma * ([1, 5] - moving_mean)/sqrt(moving_var)
edit: alpha
is called momentum in keras, if you want to check it.
Upvotes: 2