rosecatherinek
rosecatherinek

Reputation: 31

Tensorflow: minimize L2 loss on int64 data without casting to float32 because casting gives "no gradients" error

I am running into the error:"No gradients provided for any variable" when I cast my tensor to float32. But without casting, I get the error that the expected type is float and not int. So, either way, I can't seem to find a way to proceed...

In my setting, I am trying to minimize the squared error of the difference of two tensors.

softmax_w = tf.Variable(tf.zeros([SIZE_LSTM_UNITS, NUM_CLASSES], dtype=tf.float32))
softmax_b = tf.Variable(tf.zeros([NUM_CLASSES], dtype=tf.float32))
logits = tf.matmul(out, softmax_w) + softmax_b

If I compute the loss with casting as below:

predDiff = tf.cast(tf.sub(tf.arg_max(logits, 1), tf.arg_max(train_labels, 1)), tf.float32)
l2loss = tf.nn.l2_loss(predDiff)
trainStep = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(l2loss)

where, logits and train_labels are 1-hot vectors, then I get the following error:

trainStep = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(l2loss) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 198, in minimize name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 309, in apply_gradients (converted_grads_and_vars,)) ValueError: No gradients provided for any variable: ((None, < tensorflow.python.ops.variables.Variable object at 0x7f2c7363bf90 >), (None, < tensorflow.python.ops.variables.Variable object at 0x7f2ce284e9d0 >), (None, < tensorflow.python.ops.variables.Variable object at 0x7f2ce284e510 >), (None, < tensorflow.python.ops.variables.Variable object at 0x7f2ce26cf050 >), (None, < tensorflow.python.ops.variables.Variable object at 0x7f2ce26cf450 >), (None, < tensorflow.python.ops.variables.Variable object at 0x7f2ce2c9d510 >), (None, < tensorflow.python.ops.variables.Variable object at 0x7f2ce287ae90 >))

Instead, if I compute the loss without casting as below:

predDiff = tf.sub(tf.arg_max(logits, 1), tf.arg_max(train_labels, 1))

then, I get the following error:

trainStep = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(l2loss) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 196, in minimize grad_loss=grad_loss) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 238, in compute_gradients self._assert_valid_dtypes([loss]) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 379, in _assert_valid_dtypes dtype, t.name, [v for v in valid_dtypes])) ValueError: Invalid type tf.int64 for L2Loss:0, expected: [tf.float32, tf.float64, tf.float16].

However, if I use the Cross Entropy like below, then everything goes fine.

crossEnt = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, train_labels))

However, I would like to use the L2Loss because eventually I am computing the RMSE to compare the performance. I'm not sure if I am missing something obvious. Any help would be appreciated.

Upvotes: 3

Views: 1263

Answers (2)

Kevinj22
Kevinj22

Reputation: 1066

Not sure if this would help but since your predictions and targets are already ints maybe this could work as both tf.subtract and tf.multiply work with ints:

self.diff = tf.subtract(self.predictions, self.targets) # Compute difference
self.diff = tf.multiply(self.diff,self.diff, name='diff') # Square the difference
self.loss = tf.reduce_sum(self.diff) # Compute the sum

Upvotes: 0

kempy
kempy

Reputation: 616

All the weights in your network must be learnable. In order for this to be true, the ops must be differentiable - we must be able to apply the gradients. We cannot apply gradients on an x - y function from integers to integers, so I think the issue is in where you are casting to float.

Instead of this:

predDiff = tf.cast(tf.sub(tf.arg_max(logits, 1), tf.arg_max(train_labels, 1)), tf.float32)

Try casting before applying arg_max and sub:

float_logits = tf.cast(logits, tf.float32)
float_labels = tf.cast(train_labels, tf.float32)
predDiff = tf.sub(tf.arg_max(float_logits, 1), tf.arg_max(float_labels, 1)))

This way, we can actually calculate and apply gradients for sub and arg_max.

Upvotes: 1

Related Questions