flawr
flawr

Reputation: 11628

Derivative in both arguments of torch.nn.BCELoss()

When using a torch.nn.BCELoss() on two arguments that are both results of some earlier computation, I get some curious error, which this question is about:

RuntimeError: the derivative for 'target' is not implemented

The MCVE is as follows:

import torch
import torch.nn.functional as F

net1 = torch.nn.Linear(1,1)
net2 = torch.nn.Linear(1,1)
loss_fcn = torch.nn.BCELoss()

x = torch.zeros((1,1))

y = F.sigmoid(net1(x)) #make sure y is in range (0,1)
z = F.sigmoid(net2(y)) #make sure z is in range (0,1)

loss = loss_fcn(z, y) #works if we replace y with y.detach()

loss.backward()

It turns out if we call .detach() on y the error disappears. But this results in a different computation, now in the .backward()-pass, the gradients with respect to the second argument of the BCELoss will not be computed.

Can anyone explain what I'm doing wrong in this case? As far as I know all pytorch modules in torch.nn should support computing gradients. And this error message seems to tell me that the derivative is not implemented for y, which is somehow strange, as you can compute the gradient of y, but not of y.detach() which seems to be contradictory.

Upvotes: 3

Views: 3248

Answers (2)

Nei Wu
Nei Wu

Reputation: 11

I met the same problem too. As far as I know, the second argument of BCELoss(input, target),target should be a tensor without gradient attribute. It means that target.requires_grad should be False. But I don't know why.

Usually, the target(we can also call it Ground Truth) doesn't have a gradient attribute. But the target(y in your code) was calculated by F.sigmoid(net1(x)), which means the target (output of net1) has been a tensor with gradient attribute.

so, you should try:

loss = loss_fcn(z, y.detach())

or:

loss = loss_fcn(z, y.data)

maybe this?

import torch
import torch.nn.functional as F

net1 = torch.nn.Linear(1,1)
net2 = torch.nn.Linear(1,1)
loss_fcn = torch.nn.BCELoss()

x = torch.zeros((1,1))

y = F.sigmoid(net1(x)) #make sure y is in range (0,1)
z = F.sigmoid(net2(y)) #make sure z is in range (0,1)

y.retain_grad()
a = y

loss = loss_fcn(z, a.detach()) #works if we replace y with y.detach()

loss.backward()

print(y.grad)

Upvotes: 1

flawr
flawr

Reputation: 11628

It seems I misunderstood the error message. It is not y that doesn't allow the computation for gradients, it is BCELoss() that doesn't have the ability to compute gradients with respect to the second argument. A similar problem was discussed here.

Upvotes: 1

Related Questions