Reputation: 331
All,
When you train a large model with large amount samples, some samples may be cause NaN gradient when parameter updating.
And I want to find these samples out. And meanwhile I don't want this batch samples' gradient to update model's parameter, because it may be cause model's parameter being NaN.
So dose anyone have good idea to deal with this problem?
My code is like below:
# Create an optimizer.
params = tf.trainable_variables()
opt = tf.train.AdamOptimizer(1e-3)
gradients = tf.gradients(self.loss, params)
max_gradient_norm = 10
clipped_gradients, self.gradient_norms = tf.clip_by_global_norm(gradients,
max_gradient_norm)
self.optimizer = opt.apply_gradients(zip(clipped_gradients, params))
Upvotes: 5
Views: 5494
Reputation: 34026
You could use tf.is_nan
in combination with tf.cond
to only execute the rest of your code if the loss is not NaN.
Upvotes: 0
Reputation: 3358
You can check whether your gradients have NaN by tf.check_numerics
:
grad_check = tf.check_numerics(clipped_gradients)
with tf.control_dependencies([grad_check]):
self.optimizer = opt.apply_gradients(zip(clipped_gradients, params))
The grad_check
would throw InvalidArgument
if clipped_gradients is NaN or infinity.
The tf.control_dependencies
makes sure that the grad_check
is evaluated before applying the gradients.
Also see tf.add_check_numerics_ops()
.
Upvotes: 10