Reputation: 451
I am trying to use torch.nn.utils.clip_grad_norm_() which requires an iterable of Tensors. See below
for epoch in progress_bar(range(num_epochs)):
lstm.train()
outputs = lstm(trainX.to(device))
optimizer.zero_grad()
torch.nn.utils.clip_grad_norm_(lstm.parameters(), 1)
My code errors with:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-168-4cd34e6fd44d> in <module>
28 lstm.train()
29 outputs = lstm(trainX.to(device))
---> 30 torch.nn.utils.clip_grad_norm_(lstm.parameters(), 1)
31
32
/opt/conda/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py in clip_grad_norm_(parameters, max_norm, norm_type)
28 total_norm = max(p.grad.detach().abs().max() for p in parameters)
29 else:
---> 30 total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
31 clip_coef = max_norm / (total_norm + 1e-6)
32 if clip_coef < 1:
RuntimeError: stack expects a non-empty TensorList
If I example lstm.parameters() I get a list of Parameters, instead of a list of Tensors:
<class 'torch.nn.parameter.Parameter'> torch.Size([2048, 1])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048, 512])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048, 512])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048, 512])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048])
<class 'torch.nn.parameter.Parameter'> torch.Size([2048])
<class 'torch.nn.parameter.Parameter'> torch.Size([1, 512])
<class 'torch.nn.parameter.Parameter'> torch.Size([1])
Looking at the first Parameter, it is a list of Tensors:
<class 'torch.Tensor'> torch.Size([1])
<class 'torch.Tensor'> torch.Size([1])
<class 'torch.Tensor'> torch.Size([1])
<class 'torch.Tensor'> torch.Size([1])
<class 'torch.Tensor'> torch.Size([1])
<class 'torch.Tensor'> torch.Size([1])
.
.
.
Does anyone know what is going on here?
Upvotes: 0
Views: 1187
Reputation: 24726
PyTorch
's clip_grad_norm
, as the name suggests, operates on gradients.
You have to calculate your loss
from output
, use loss.backward()
and perform gradient clipping afterwards.
Also, you should use optimizer.step()
after this operation.
Something like this:
for epoch in progress_bar(range(num_epochs)):
lstm.train()
for batch in dataloader:
optimizer.zero_grad()
outputs = lstm(trainX.to(device))
loss = my_loss(outputs, targets)
loss.backward()
torch.nn.utils.clip_grad_norm_(lstm.parameters(), 1)
optimizer.step()
You don't have parameter.grad
calculated (it's value is None
) and that's the reason of your error.
Upvotes: 1