Reputation: 437
I am working on the problem of sentence translation from english into german. So the final output is a german sequence and I need to check how good are my predictions.
I have found in tensorflow tutorial the following loss function:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask
return tf.reduce_mean(loss_)
But I don't know what this function does. I know(maybe I am wrong) that we cannot use SparseCategoricalCrossentropy for sequences in a straightforward manner and we have to do some kind of manipulations. But for example in the code above I see, that SparseCategoricalCrossentropy was used in straightforward manner on a sequence output. Why?
What does mask
variable do?
Can you explain the code?
EDIT: tutorial- https://www.tensorflow.org/tutorials/text/nmt_with_attention
Upvotes: 0
Views: 532
Reputation: 1441
mask
in mask = tf.math.logical_not(tf.math.equal(real, 0))
is taking care of the PADDING
.
So, in your batch you would have sentences of different length and you do 0
padding to make all of them of equal length (think about I have an apple
v/s It's a good day to play football in the sun
)
But, it doesn't make sense to include the 0
padded section in the loss calculation - hence, it's first looking into indices where you have a 0
and using multiplication later on to make their loss contribution 0.
Upvotes: 1