XiangyuZhang
XiangyuZhang

Reputation: 11

How can I use Batch Normalization to normalize the batch dimension?

I want to use the Batchnormalization to normalize the batch dimension, but naturally the batch dimension in keras is none. So what can i do.

The example of keras show the axis is -1 for conv2d, which means the channel dimension.

keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

axis: Integer, the axis that should be normalized (typically the features axis). For instance, after a Conv2D layer with data_format="channels_first", set axis=1 in BatchNormalization.

Upvotes: 0

Views: 294

Answers (1)

pitfall
pitfall

Reputation: 2621

It simply makes no sense to apply the BN layer to the batch axis.

Why? If this is plausible, you will end up learn BN parameters in terms of several trainable vectors of batch_size dimension. OK. So what. You can still train such a model without seeing an error message.

But how about testing? The above BN simply implies that you have to do inference with the exact same batch_size as in training. Otherwise, the tensor operation will be ill-defined and you will see an error.

More importantly, the BN that you proposed means to treat samples differently according to their relative positions in a batch. Because you will always normalize those samples that appear at the 1st place in a batch with one set of parameters, while using another set of parameters for those samples that appear at a different place. Again, you may say so what.
However, the fact is that you will have to shuffle your training samples anyway, implying that such relative positions in a batch is completely meaningless. In other words, learning something about these relative positions is doomed to be failed.

Upvotes: 1

Related Questions