Reputation: 31
I'm trying to create a custom loss function that has an output of an integer (which is converted to one hot encoding in the loss function).
But the problem is that one_hot does not have differentiable gradients. Are there any workarounds?
def new_loss(hidden, output, random_size=20):
output1 = tf.cast(
output,
dtype=tf.int32,
)
one_hot = tf.one_hot(output1, num_words, dtype=tf.int32,)
one_hot = tf.cast(
one_hot,
dtype=tf.float32
)
score = K.dot(hidden, one_hot)
random_words = tf.random.uniform((random_size,), maxval=num_words, dtype=tf.dtypes.int32)
random_words_1_hot = tf.one_hot(random_words, num_words, dtype=tf.float32)
scores = K.dot(random_words_1_hot, hidden)
average = K.sum(K.log (1 - K.sigmoid(scores)) / random_size)
return (-1 * K.log (K.sigmoid(score)) - average)
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Upvotes: 3
Views: 993
Reputation: 4543
The problem is not in one_hot encoding itself, but rather in series of cast operations. More specifically, TensorFlow won't propagate through integers. Assuming both hidden
and output
is of type float, if you change this
output1 = tf.cast(output, dtype=tf.int32,)
one_hot = tf.one_hot(output1, num_words, dtype=tf.int32,)
one_hot = tf.cast(one_hot, dtype=tf.float32)
to this
one_hot = tf.one_hot(tf.cast(output, tf.int32), num_words, dtype=tf.float32)
You'll get your gradients.
More detailed example:
one_hot1 = tf.one_hot(tf.cast(np.random.rand(2), tf.int32), num_words, dtype=tf.float32)
hidden = tf.constant([1.,2.,3.,4.], shape=(2,2))
one_hot = tf.cast(one_hot1, dtype=tf.float32)
hidden1 = tf.cast(hid, tf.float32)
score = tf.matmul(hidden, one_hot)
random_words = tf.random.uniform((random_size,), maxval=num_words, dtype=tf.float32)
random_words_1_hot = tf.one_hot(tf.cast(random_words, tf.int32), num_words, dtype=tf.float32)
scores = tf.matmul(random_words_1_hot, hidden)
average = tf.reduce_sum(tf.log(1 - tf.sigmoid(scores)) / random_size)
res = -1 * tf.log(tf.sigmoid(score)) - average
grads = tf.gradients(res, [hidden1, one_hot1])
sess = tf.Session()
print(sess.run(res))
print(sess.run(grads))
I used core TF operation just for the sake of consistency. You can see that if one_hot1
will be initially created as tf.int
and then recast to float
, there'll be no gradient. More about this here https://github.com/tensorflow/tensorflow/issues/20524
And
Upvotes: 2