Reputation: 137
So, I have a deep convolutional network with an lstm layer, and after the ltsm layer it splits off to compute two different functions (using two different linear layers) whose results are then added together to form the final network output.
When I compute the loss of the network so that I can have it compute the gradients and update the weights, I have it do a few operations and then have it compute the loss between the derived value and the calculated target value.
def update(output, target):
# target output is calculated outside the function
# operations on output
loss(output, target).backward()
self.optimizer.step()
The network has some loss (sometimes in a very small order of magnitude, but sometimes also on higher orders of magnitude), for example a few of the losses:
tensor(1.00000e-04 *
5.7420)
tensor(2.7190)
tensor(0.9684)
It also has gradients as calculated here:
for param in self.parameters():
print(param.grad.data.sum())
Which outputs:
tensor(1.00000e-03 *
1.9996)
tensor(1.00000e-03 *
2.6101)
tensor(1.00000e-02 *
-1.3879)
tensor(1.00000e-03 *
-4.5834)
tensor(1.00000e-02 *
2.1762)
tensor(1.00000e-03 *
3.6246)
tensor(1.00000e-03 *
6.6234)
tensor(1.00000e-02 *
2.9373)
tensor(1.00000e-02 *
1.2680)
tensor(1.00000e-03 *
1.8791)
tensor(1.00000e-02 *
1.7322)
tensor(1.00000e-02 *
1.7322)
tensor(0.)
tensor(0.)
tensor(1.00000e-03 *
-6.7885)
tensor(1.00000e-02 *
9.7793)
And:
tensor(2.4620)
tensor(0.9544)
tensor(-26.2465)
tensor(0.2280)
tensor(-219.2602)
tensor(-2.7870)
tensor(-50.8203)
tensor(3.2548)
tensor(19.6163)
tensor(-18.6029)
tensor(3.8564)
tensor(3.8564)
tensor(0.)
tensor(0.)
tensor(0.8040)
tensor(-0.1157)
But when I compare the weight before and after running the optimizer, I get the result that the weights are equal to each other.
Code to see if weights change:
before = list(neuralnet.parameters())
neuralnet.update()
after = list(neuralnet.parameters())
for i in range(len(before)):
print(torch.equal(before[i].data, after[i].data))
The above returns True for each iteration.
Upvotes: 4
Views: 13765
Reputation: 943
While initializing the parameters do wrap those in torch.nn.Parameter()
class for the optimizer to update these. If you are using pytorch < 0.4 try using torch.autograd.Variable()
. For example:
import torch
import torch.utils.data
from torch import nn, optim
from torch.nn import functional as F
class TEMP(nn.Module):
# Whole architecture
def __init__(self):
super(TEMP, self).__init__()
self.input = nn.Parameter(torch.ones(1,requires_grad = True)) # <----wrap it like this
def forward(self,x):
wt = self.input
y = wt*x
return y
model = TEMP()
optimizer = optim.Adam(model.parameters(), lr=0.001)
x = torch.randn(100)
y = 5*x
loss = torch.sum((y - model(x)).pow(2))
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(model.input)
And Please note if you are initializing a tensor in pytorch >= 0.4 do change the value of requires_grad = True
if you want that variable to be updated.
Upvotes: 2
Reputation: 137
https://discuss.pytorch.org/t/gradients-exist-but-weights-not-updating/20484/2?u=wr01 has the answer I sought. The problem was that neuralnet.parameters()
does not clone the list of parameters, so when I was updating the weights, the weights were updating in the before variable.
Upvotes: 3