Keras CTC Loss input

Question

I'm trying to use CTC for speech recognition using keras and have tried the CTC example here. In that example, the input to the CTC Lambda layer is the output of the softmax layer (y_pred). The Lambda layer calls ctc_batch_cost that internally calls Tensorflow's ctc_loss, but the Tensorflow ctc_loss documentation say that the ctc_loss function performs the softmax internally so you don't need to softmax your input first. I think the correct usage is to pass inner to the Lambda layer so you only apply softmax once in ctc_loss function internally. I have tried the example and it works. Should I follow the example or the Tensorflow documentation?

Prophecies · Accepted Answer

The loss used in the code you posted is different from the one you linked. The loss used in the code is found here

The keras code peforms some pre-processing before calling the ctc_loss that makes it suitable for the format required. On top of requiring the input to be not softmax-ed, tensorflow's ctc_loss also expects the dims to be NUM_TIME, BATCHSIZE, FEATURES. Keras's ctc_batch_cost does both of these things in this line.

It does log() which gets rid of the softmax scaling and it also shuffles the dims so that its in the right shape. When I say gets rid of softmax scaling, it obviously does not restore the original tensor, but rather softmax(log(softmax(x))) = softmax(x). See below:

def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()


x = [1,2,3]
y = softmax(x)
z = np.log(y) # z =/= x (obviously) BUT
yp = softmax(z) # yp = y #####

Keras CTC Loss input

Answers (1)

Related Questions