Reputation: 21
In an attempt to understand how BatchNorm1d
works in PyTorch, I tried to match the output of BatchNorm1d
operation on a 2D tensor with manually normalizing it. The manual output seems to be scaled down by a factor of 0.9747. Here's the code (note that affine is set to false):
import torch
import torch.nn as nn
from torch.autograd import Variable
X = torch.randn(20,100) * 5 + 10
X = Variable(X)
B = nn.BatchNorm1d(100, affine=False)
y = B(X)
mu = torch.mean(X[:,1])
var_ = torch.var(X[:,1])
sigma = torch.sqrt(var_ + 1e-5)
x = (X[:,1] - mu)/sigma
#the ration below should be equal to one
print(x.data / y[:,1].data )
Output is:
0.9747
0.9747
0.9747
....
Doing the same thing for BatchNorm2d
works without any issues. How does BatchNorm1d
calculate its output?
Upvotes: 0
Views: 886
Reputation: 21
Found out the reason. torch.var
uses Bessel's correction while calculating variance. Passing the attribute unbiased=False
gives identical values.
Upvotes: 2