Reputation: 135
I am studying how to train a normalizing flow model from the below tutorial,
0 -log(256) * (28*28*1)
.
Upvotes: 0
Views: 734
Reputation: 95
Note that 1-sel.alpha
is the derivative of the scaling operation, thus the Jacobian of this operation is a diagonal matrix with z.shape[1:]
entries on the diagonal, thus the Jacobian determinant is simply the product of these diagonal entries which gives rise to
ldj += np.log(1-self.alpa) * np.prod(z.shape[1:])
the second line accounts for the log determinant of the sigmoid $s(z)$ function as $s'(z)=s(z)(1-s(z))$. So the two lines result from the application of the chain rule which turns into a sum when taking the logarithm.
Setting ldj = torch.zeros(1,)
is just the initialization of this variable - its value will be only updated in the module. Not sure what the motivation is, but it could be that they want to apply the dequant_module for each individual sample in the batch.
Upvotes: 1