Reputation: 11628
When using a torch.nn.BCELoss()
on two arguments that are both results of some earlier computation, I get some curious error, which this question is about:
RuntimeError: the derivative for 'target' is not implemented
The MCVE is as follows:
import torch
import torch.nn.functional as F
net1 = torch.nn.Linear(1,1)
net2 = torch.nn.Linear(1,1)
loss_fcn = torch.nn.BCELoss()
x = torch.zeros((1,1))
y = F.sigmoid(net1(x)) #make sure y is in range (0,1)
z = F.sigmoid(net2(y)) #make sure z is in range (0,1)
loss = loss_fcn(z, y) #works if we replace y with y.detach()
loss.backward()
It turns out if we call .detach()
on y
the error disappears. But this results in a different computation, now in the .backward()
-pass, the gradients with respect to the second argument of the BCELoss
will not be computed.
Can anyone explain what I'm doing wrong in this case? As far as I know all pytorch modules in torch.nn
should support computing gradients. And this error message seems to tell me that the derivative is not implemented for y
, which is somehow strange, as you can compute the gradient of y
, but not of y.detach()
which seems to be contradictory.
Upvotes: 3
Views: 3248
Reputation: 11
I met the same problem too. As far as I know, the second argument of BCELoss(input, target)
,target
should be a tensor without gradient attribute. It means that target.requires_grad
should be False. But I don't know why.
Usually, the target
(we can also call it Ground Truth
) doesn't have a gradient attribute. But the target
(y
in your code) was calculated by F.sigmoid(net1(x))
, which means the target
(output of net1) has been a tensor with gradient attribute.
so, you should try:
loss = loss_fcn(z, y.detach())
or:
loss = loss_fcn(z, y.data)
maybe this?
import torch
import torch.nn.functional as F
net1 = torch.nn.Linear(1,1)
net2 = torch.nn.Linear(1,1)
loss_fcn = torch.nn.BCELoss()
x = torch.zeros((1,1))
y = F.sigmoid(net1(x)) #make sure y is in range (0,1)
z = F.sigmoid(net2(y)) #make sure z is in range (0,1)
y.retain_grad()
a = y
loss = loss_fcn(z, a.detach()) #works if we replace y with y.detach()
loss.backward()
print(y.grad)
Upvotes: 1