Reputation: 213
When training a model using OpenNMT-py, we get a dict as output, containing the weights and biases of the network. However, these tensors have requires_grad = False, and so, do not have a gradient. For example. with one layer, we might have the following tensors, denoting embeddings as well as weights and biases in the encoder and decoder. None of them have a gradient attribute.
encoder.embeddings.emb_luts.0.weight
decoder.embeddings.emb_luts.0.weight
encoder.rnn.weight_ih_l0
encoder.rnn.weight_hh_l0
encoder.rnn.bias_ih_l0
encoder.rnn.bias_hh_l0
decoder.rnn.layers.0.weight_ih
decoder.rnn.layers.0.weight_hh
decoder.rnn.layers.0.bias_ih
decoder.rnn.layers.0.bias_hh
Can OpenNMT-py be made to set requires_gradient = True with some option I have not found or is there some other way to obtain the gradient of these tensors?
Upvotes: 0
Views: 94
Reputation: 11213
The gradients are accessible only inside the training loop, where optim.step()
is called. If you want to log the gradients (or norm of gradients or whatever) to TensorBoard, you can probably best get them before the optimizer step is called. It happens in the _gradient_accumulation
method of the Trainer
object.
Be aware that there are two places where optim.step()
is called. Which one is used depends on whether you do the update after every batch or whether you accumulate gradient from multiple batches and do the update afterward.
Upvotes: 0