model.parameters() does not produce an iterable of Tensors

Question

I am trying to use torch.nn.utils.clip_grad_norm_() which requires an iterable of Tensors. See below

for epoch in progress_bar(range(num_epochs)): 
    lstm.train()
    outputs = lstm(trainX.to(device))
    optimizer.zero_grad()
    torch.nn.utils.clip_grad_norm_(lstm.parameters(), 1)

My code errors with:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
 in 
     28     lstm.train()
     29     outputs = lstm(trainX.to(device))
---> 30     torch.nn.utils.clip_grad_norm_(lstm.parameters(), 1)
     31 
     32 

/opt/conda/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py in clip_grad_norm_(parameters, max_norm, norm_type)
     28         total_norm = max(p.grad.detach().abs().max() for p in parameters)
     29     else:
---> 30         total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
     31     clip_coef = max_norm / (total_norm + 1e-6)
     32     if clip_coef < 1:

RuntimeError: stack expects a non-empty TensorList

If I example lstm.parameters() I get a list of Parameters, instead of a list of Tensors:

 torch.Size([2048, 1])
 torch.Size([2048, 512])
 torch.Size([2048])
 torch.Size([2048])
 torch.Size([2048, 512])
 torch.Size([2048, 512])
 torch.Size([2048])
 torch.Size([2048])
 torch.Size([1, 512])
 torch.Size([1])

Looking at the first Parameter, it is a list of Tensors:

 torch.Size([1])
 torch.Size([1])
 torch.Size([1])
 torch.Size([1])
 torch.Size([1])
 torch.Size([1])
.
.
.

Does anyone know what is going on here?

Szymon Maszke · Accepted Answer

PyTorch's clip_grad_norm, as the name suggests, operates on gradients. You have to calculate your loss from output, use loss.backward() and perform gradient clipping afterwards.

Also, you should use optimizer.step() after this operation.

Something like this:

for epoch in progress_bar(range(num_epochs)): 
    lstm.train()
    for batch in dataloader:
        optimizer.zero_grad()
        outputs = lstm(trainX.to(device))
        loss = my_loss(outputs, targets)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(lstm.parameters(), 1)
        optimizer.step()

You don't have parameter.grad calculated (it's value is None) and that's the reason of your error.

model.parameters() does not produce an iterable of Tensors

Answers (1)

Related Questions