thaumoctopus
thaumoctopus

Reputation: 213

Getting the gradients of a model trained in OpenNMT-py

When training a model using OpenNMT-py, we get a dict as output, containing the weights and biases of the network. However, these tensors have requires_grad = False, and so, do not have a gradient. For example. with one layer, we might have the following tensors, denoting embeddings as well as weights and biases in the encoder and decoder. None of them have a gradient attribute.

encoder.embeddings.emb_luts.0.weight

decoder.embeddings.emb_luts.0.weight

encoder.rnn.weight_ih_l0

encoder.rnn.weight_hh_l0

encoder.rnn.bias_ih_l0

encoder.rnn.bias_hh_l0

decoder.rnn.layers.0.weight_ih

decoder.rnn.layers.0.weight_hh

decoder.rnn.layers.0.bias_ih

decoder.rnn.layers.0.bias_hh

Can OpenNMT-py be made to set requires_gradient = True with some option I have not found or is there some other way to obtain the gradient of these tensors?

Upvotes: 0

Views: 94

Answers (1)

Jindřich
Jindřich

Reputation: 11213

The gradients are accessible only inside the training loop, where optim.step() is called. If you want to log the gradients (or norm of gradients or whatever) to TensorBoard, you can probably best get them before the optimizer step is called. It happens in the _gradient_accumulation method of the Trainer object.

Be aware that there are two places where optim.step() is called. Which one is used depends on whether you do the update after every batch or whether you accumulate gradient from multiple batches and do the update afterward.

Upvotes: 0

Related Questions