enterML
enterML

Reputation: 2285

Padding time dimension in softmax output for CTC loss

Network:

Input sequence  -> BiLSTM---------> BiLSTM --------> Dense with softmax
Output shapes:    (None, 5, 256)   (None, 5, 128)      (None, 5, 11)

Here is my CTC loss:

def calculate_ctc_loss(y_true, y_pred):
    batch_length = tf.cast(tf.shape(y_true)[0], dtype="int64")
    input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
    label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
    
    input_length = input_length * tf.ones(shape=(batch_length, 1), dtype="int64")
    label_length = label_length * tf.ones(shape=(batch_length, 1), dtype="int64")
    
    loss = tf.keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
    return loss

There are 10 classes in total. For the first batch with a batch size of 16 the shapes are:

y_true: (16, 7)
y_pred: (16, 5, 11)

I tried to pad the time demesnion in y_pred so that the shape is (16, 7, 11) but the loss turned nan.

Ques: How to correctly pad the time dimension in this case so that y_true and y_pred have compatible shapes for CTC calculation?

Upvotes: 0

Views: 106

Answers (0)

Related Questions