An equivalent but differentiable argmax expression in Tensorflow

Question

I need a one-hot representation for the maximum value in a tensor.

For example, consider a tensor 2 x 3:

[ [1, 5, 2],
  [0, 3, 7] ]

The one-hot-argmax representation I am aiming for looks like this:

[ [0, 1, 0],
  [0, 0, 1] ]

I can do it as follows, where my_tensor is a N x 3 tensor:

position = tf.argmax(my_tensor, axis=1).      # Shape (N x )
one_hot_pos = tf.one_hot(position, depth=3)   # Shape (N x 3)

But this part of the code need be differentiable since I'm training over it. My workaround was as follows, where EPSILON = 1e-3 is a small constant:

max_value = tf.reduce_max(my_tensor, axis=1, keepdims=True)
clip_min = max_value - EPSILON
one_hot_pos = (tf.clip_by_value(my_tensor, clip_min, max_value) - clip_min) / (max_value - clip_min)

The workaround works most of the time, but - as expected - it has some issues:

Sensible to EPSILON: if it is too small, a division by zero might happen
Can't solve ties: argmax only chooses one even in a tie situation

Do you know any better way of simulating the argmax followed by one_hot situation, while fixing the two mentioned issues, but using only differentiable Tensorflow functions?

An equivalent but differentiable argmax expression in Tensorflow

Answers (1)

Related Questions