Torch backward PowBackward0 causes nan gradient where it shouldn't

Question

I have a pytorch tensor with NaN inside, when I calculate the loss function using a simple MSE Loss the gradient becomes NaN even if I mask out the NaN values.

Weirdly this happens only when the mask is applyied after calculating the loss and only when the loss has a pow operation inside. The various cases follow

import torch
torch.autograd.set_detect_anomaly(True)

x = torch.rand(10, 10) 
y = torch.rand(10, 10)
w = torch.rand(10, 10, requires_grad=True)
y[y > 0.5] = torch.nan


o = w @ x
l = (y - o)**2
l = l[~y.isnan()]

try:
    l.mean().backward(retain_graph=True)
except RuntimeError:
    print('(y-o)**2 caused nan gradient')

l = (y - o)
l = l[~y.isnan()]

try:
    l.mean().backward(retain_graph=True)
except RuntimeError():
    pass
else:
    print('y-o does not cause nan gradient')

l = (y[~y.isnan()] - o[~y.isnan()])**2
l.mean().backward()
print('masking before pow does not propagate nan gradient')

What makes NaN gradients propagate when passing through the backward pass of the pow function?

Torch backward PowBackward0 causes nan gradient where it shouldn't

Answers (1)

Related Questions

Torch backward PowBackward0 causes nan gradient where it shouldn&#39;t

Answers (1)

Related Questions

Torch backward PowBackward0 causes nan gradient where it shouldn't