Reputation: 2539
I have some issues making this custom loss function (it checks if y_pred
data is ordered coherently with the real ordering indices provided by y_true
) work:
def custom_objective(y_true, y_pred):
y_true = tf.cast(y_true, tf.float32)
ordered_output = tf.cast(tf.nn.top_k(-y_pred, k=5)[1], tf.float32)
return tf.sqrt(tf.reduce_mean(tf.square(ordered_output - y_true), axis=-1))
I can properly run it with sample data:
with tf.Session() as sess:
print(custom_objective(tf.constant([0, 1, 2, 3, 4, 5]),
tf.constant([0.0, 0.9, 0.2, 0.3, 0.5, 0.8])).eval()) # 1.82574
But somehow it doesn't work if I use it in model.compile
, as it raises:
/Users/luca/.virtualenvs/python3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
358 else:
359 if values is None:
--> 360 raise ValueError("None values not supported.")
361 # if dtype is provided, forces numpy array to be the type
362 # provided if possible.
ValueError: None values not supported.
Note that there is no "None" values in my training test set if I change ordered_output = tf.cast(tf.nn.top_k(-y_pred, k=5)[1], tf.float32)
to ordered_output = -y_pred
the model compiles fine and starts training properly (but it's clearly not the loss function I want).
I have the subtle feeling that there might be something wrong in using top_k
in a loss function as I don't see how it could be differentiable, but I don't have better ideas for evaluating differences in predicted ordering. Hints/ideas/papers/references? :)
Upvotes: 3
Views: 3978
Reputation: 11543
This might be voted down as I won't really fix your code but here goes nothing :
I don't believe, indeed, that you can use top_k as an objective function. Just like you can't use the accuracy as an objective function.
The reason is mathematical. Even if keras, tensorflow, theano and co. are awesome tools for AI and allows everybody to fiddle with neural nets, the latters still remain very complex mathematical tools. Those maths are well hidden under the hood but you should be aware of them when trying to go further than prefabricated tools.
What happens when you train a network is that you compute how wrong the network is on an example and you backpropagate that error to learn from it. The algorithms behind that backpropagation are optimizers, more precisely they are gradient based optimizers. Computing a gradient requires to differentiate the function that we are optimizing, the loss/objective function. It means that the objective must be differentiable. The accuracy isn't a differentiable function, it takes as an input a real number between 0 and 1 and outputs a step-like function : 0 if x<0.5 and 1 if x>0.5. That function isnt differentiable because we can't get its gradient in 0.5. The top_k function is some kind of accuracy function. So indeed in my opinion you cannot use it in an objective, because under the hood, the smart tensorflow has to compute the gradients of your function.
I hope this helps :)
Upvotes: 7