Reputation: 305
In order to reduce the tensor, I defined all the variables with dytpe=tf.float16
in my Model, and then defined the optimizer:
optimizer = tf.train.AdamOptimizer(self.learning_rate)
self.compute_gradients = optimizer.compute_gradients(self.mean_loss_reg)
train_adam_op = optimizer.apply_gradients(self.compute_gradients, global_step=self.global_step)
Everything works ok! but after I run the train_adam_op, the the gradients and variables are nan in python. I wander If the apply_gradients()
API supports tf.float16
type? Why I got nan after apply_gradients()
was called by session.run()
....
Upvotes: 1
Views: 901
Reputation: 1469
The dynamic range of fp16 is fairly limited compared to that of 32-bit floats. As a result, it's pretty easy to overflow or underflow them, which often results in the NaN that you've encountered.
You can insert a few check_numerics operations in your model to help pinpoint the specific operation(s) that becomes unstable when performed on fp16.
For example, you can wrap a L2 loss operation as follow to check that its result fits in an fp16
A = tf.l2_loss(some_tensor)
becomes
A = tf.check_numerics(tf.l2_loss(some_tensor), "found the root cause")
The most common source of overflows and underflows are the exp(), the log(), as well as the various classification primitives, so I would start looking there.
Once you've figured out which sequence of operations is problematic, you can update your model to perform that sequence using 32-bit floats by using tf.cast() to convert the inputs of the sequence to 32bit floats, and cast the result back to fp16.
Upvotes: 4