does tensorflow 0.10.0rc version support float16?

Question

In order to reduce the tensor, I defined all the variables with dytpe=tf.float16 in my Model, and then defined the optimizer:

optimizer = tf.train.AdamOptimizer(self.learning_rate)
self.compute_gradients = optimizer.compute_gradients(self.mean_loss_reg)
train_adam_op = optimizer.apply_gradients(self.compute_gradients, global_step=self.global_step)

Everything works ok! but after I run the train_adam_op, the the gradients and variables are nan in python. I wander If the apply_gradients() API supports tf.float16 type? Why I got nan after apply_gradients() was called by session.run()....

Benoit Steiner · Accepted Answer

The dynamic range of fp16 is fairly limited compared to that of 32-bit floats. As a result, it's pretty easy to overflow or underflow them, which often results in the NaN that you've encountered.

You can insert a few check_numerics operations in your model to help pinpoint the specific operation(s) that becomes unstable when performed on fp16.

For example, you can wrap a L2 loss operation as follow to check that its result fits in an fp16

A = tf.l2_loss(some_tensor)

becomes

A = tf.check_numerics(tf.l2_loss(some_tensor), "found the root cause")

The most common source of overflows and underflows are the exp(), the log(), as well as the various classification primitives, so I would start looking there.

Once you've figured out which sequence of operations is problematic, you can update your model to perform that sequence using 32-bit floats by using tf.cast() to convert the inputs of the sequence to 32bit floats, and cast the result back to fp16.

does tensorflow 0.10.0rc version support float16?

Answers (1)

Related Questions