Pytorch Batchnorm layer different from Keras Batchnorm

Question

I'm trying to copy pre-trained BN weights from a pytorch model to its equivalent Keras model but I keep getting different outputs.

I read Keras and Pytorch BN documentation and I think that the difference lies in the way they calculate the "mean" and "var".

Pytorch:

The mean and standard-deviation are calculated per-dimension over the mini-batches

source: Pytorch BatchNorm

Thus, they average over samples.

Keras:

axis: Integer, the axis that should be normalized (typically the features axis). For instance, after a Conv2D layer with data_format="channels_first", set axis=1 in BatchNormalization.

source: Keras BatchNorm

and here they average over the features (channels)

What's the right way? How to transfer BN weights between the models?

Pytorch Batchnorm layer different from Keras Batchnorm

Answers (1)

Related Questions