Reputation: 349
I am currently using PyTorch for deep neural network. I wrote a toy neural network shown below and I found that whether or not I set requires_grad=True
for label y
makes huge difference. When y.requires_grad=True
, the neural network diverges. I am wondering why this happens.
import torch
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)
y = x.pow(2) + 10 * torch.rand(x.size())
x.requires_grad = True
# this is where problem occurs
y.requires_grad = True
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden)
self.predict = torch.nn.Linear(n_hidden, n_output)
def forward(self, x):
x = torch.relu(self.hidden(x))
x = self.predict(x)
return x
net = Net(1, 10, 1)
optimizer = torch.optim.SGD(net.parameters(), lr=0.5)
criterion = torch.nn.MSELoss()
for t in range(200):
y_pred = net(x)
loss= criterion(y_pred, y)
optimizer.zero_grad()
loss.backward()
print("Epoch {}: {}".format(t, loss))
optimizer.step()
Upvotes: 0
Views: 339
Reputation: 11490
It seems that you are using an outdated version of PyTorch. In more recent versions (0.4.0+), this will throw you the following error:
AssertionError: nn criterions don't compute the gradient w.r.t. targets -
please mark these tensors as not requiring gradients
Essentially, it tells you that it will only work if you set the requires_grad
flag to False
for your targets. The reason why this works at all in prior versions is indeed very interesting, and also why it causes diverging behavior.
My guess would be that a backwards pass would then also change your targets (instead of only changing your weights), which is obviously something you do not desire.
Upvotes: 1