Loss function for sequences (in Tensorflow 2.0)

Question

I am working on the problem of sentence translation from english into german. So the final output is a german sequence and I need to check how good are my predictions.

I have found in tensorflow tutorial the following loss function:

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

def loss_function(real, pred):
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)

    mask = tf.cast(mask, dtype=loss_.dtype)
    loss_ *= mask

    return tf.reduce_mean(loss_)

But I don't know what this function does. I know(maybe I am wrong) that we cannot use SparseCategoricalCrossentropy for sequences in a straightforward manner and we have to do some kind of manipulations. But for example in the code above I see, that SparseCategoricalCrossentropy was used in straightforward manner on a sequence output. Why?

What does mask variable do? Can you explain the code?

EDIT: tutorial- https://www.tensorflow.org/tutorials/text/nmt_with_attention

Partha Mandal · Accepted Answer

mask in mask = tf.math.logical_not(tf.math.equal(real, 0)) is taking care of the PADDING.

So, in your batch you would have sentences of different length and you do 0 padding to make all of them of equal length (think about I have an apple v/s It's a good day to play football in the sun)

But, it doesn't make sense to include the 0 padded section in the loss calculation - hence, it's first looking into indices where you have a 0 and using multiplication later on to make their loss contribution 0.

Loss function for sequences (in Tensorflow 2.0)

Answers (1)

Related Questions