Reputation: 91
Problem: a very long RNN net
N1 -- N2 -- ... --- N100
For a Optimizer like AdamOptimizer
, the compute_gradient()
will give gradients to all training variables.
However, it might explode during some step.
A method like in how-to-effectively-apply-gradient-clipping-in-tensor-flow can clip large final gradient.
But how to clip those intermediate ones?
One way might be manually do the backprop from "N100 --> N99", clip the gradients, then "N99 --> N98" and so on, but that's just too complicated.
So my question is: Is there any easier method to clip the intermediate gradients? (of course, strictly speaking, they are not gradients anymore in the mathematical sense)
Upvotes: 8
Views: 1545
Reputation: 465
@tf.custom_gradient
def gradient_clipping(x):
return x, lambda dy: tf.clip_by_norm(dy, 10.0)
Upvotes: 2
Reputation: 5206
You can use the custom_gradient
decorator to make a version of tf.identity
which clips intermediate exploded gradients.
``` from tensorflow.contrib.eager.python import tfe
@tfe.custom_gradient def gradient_clipping_identity(tensor, max_norm): result = tf.identity(tensor)
def grad(dresult): return tf.clip_by_norm(dresult, max_norm), None
return result, grad ```
Then use gradient_clipping_identity
as you'd normally use identity and your gradients will be clipped in the backward pass.
Upvotes: 0