Reputation: 33
I am trying to write a Keras 2 LSTM with a custom loss function via Tensorflow:
model.compile(loss=in_top_k_loss, optimizer='rmsprop', metrics=[bin_crossent_true_only, 'binary_crossentropy', 'mean_squared_error', 'accuracy'])
My training set has examples with different sizes of the time dimension, hence I use train_on_batch
where each batch consists only of instances with the same time dimension. Batch size is 256.
The following code throws a very nasty exception in the first epoch (when train_on_batch
is first called):
# takes 2 1D arrays of equal length, returns a single value (the negative of my own "precision" measure)
def in_top_k_loss_single(y_true, y_pred):
y_true_labels = tf.cast(tf.transpose(tf.where(y_true > 0))[0], tf.int32)
y_pred = tf.reshape(y_pred, [1, tf.shape(y_pred)[0]])
y_topk_tensor = tf.nn.top_k(y_pred, k=7)
y_topk_ixs = y_topk_tensor[0][0][:7]
y_topk = y_topk_tensor[1][0][:7]
y_topk_len = tf.cast(tf.count_nonzero(y_topk_ixs), tf.int32)
y_topk = y_topk[:y_topk_len]
y_topk0 = tf.expand_dims(y_topk, 1)
y_true_labels0 = tf.expand_dims(y_true_labels, 0)
re = tf.cast(tf.reduce_any(tf.equal(y_topk0, y_true_labels0), 1), tf.int32) / tf.range(1,y_topk_len+1)
return (-1) * tf.where(tf.equal(tf.reduce_sum(y_pred), tf.constant(0.0)), tf.constant(0.0), tf.cast(tf.reduce_mean(re),tf.float32))
# takes 2 matrices of equal sizes,
# applies the upper function for y_true[i] & y_pred[i] for each row i,
# returns a single value (mean of all row-wise values)
def in_top_k_loss(y_true, y_pred):
# if I change `in_top_k_loss_single` to `keras.metrics.binary_crossentropy` (for instance) it runs
return K.mean(tf.map_fn(lambda x: in_top_k_loss_single(x[0], x[1]), (y_true, y_pred), dtype=tf.float32))
where in_top_k_loss
is my custom loss function in the Keras model.
These functions seem to work when I test them separately with different input (even tricky one). It seems that only Keras has problems with them - perhaps it expects different datatypes/shapes/etc.
Some smart ideas from the Internet: Tried changing the batch size, changing the optimizer and clipping the gradient - no success. Also tried calling evaluate
before train_on_batch
- no success.
Rest of the code works with losses from Keras as well as losses like this one:
def bin_crossent_true_only(y_true, y_pred):
return (1 + keras.backend.sum(y_pred)) * keras.metrics.binary_crossentropy(y_true, y_true * y_pred)
The function in_top_k_loss
works and returns meaningful results if used in the metrics
array.
All input (y_true, y_pred) is not NaN. y_true may has 0s and 1s (zero or more 1s per row, i.e. per instance of the training set).
The exception itself:
Traceback (most recent call last):
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 491, in apply_op
preferred_dtype=default_dtype)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 702, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 110, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 99, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 360, in make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 9, in <module>
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\models.py", line 941, in train_on_batch
class_weight=class_weight)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1620, in train_on_batch
self._make_train_function()
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1002, in _make_train_function
self.total_loss)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\optimizers.py", line 210, in get_updates
new_a = self.rho * a + (1. - self.rho) * K.square(g)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 1225, in square
return tf.square(x)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\math_ops.py", line 384, in square
return gen_math_ops.square(x, name=name)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2733, in square
result = _op_def_lib.apply_op("Square", x=x, name=name)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 504, in apply_op
values, as_ref=input_arg.is_ref).dtype.name
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 702, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 110, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 99, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 360, in make_tensor_proto
raise ValueError("None values not supported.")
ValueError: None values not supported.
Upvotes: 2
Views: 1209
Reputation: 126154
The optimizers in TensorFlow require that the loss function be differentiable, which is determined by all of the operations between the loss result and the variables in the TensorFlow graph having defined gradients. The tf.where()
operation does not have defined gradients, which means that the overall loss function is not differentiable. The result of trying to compute the gradients of a non-differentiable function in TensorFlow is None
, which results in the error you are seeing when Keras tries to update the variables.
Upvotes: 4