Reputation: 1538
I'm trying to use CTC for speech recognition using keras and have tried the CTC example here. In that example, the input to the CTC Lambda
layer is the output of the softmax layer (y_pred
). The Lambda
layer calls ctc_batch_cost
that internally calls Tensorflow's ctc_loss
, but the Tensorflow ctc_loss
documentation say that the ctc_loss
function performs the softmax internally so you don't need to softmax your input first. I think the correct usage is to pass inner
to the Lambda
layer so you only apply softmax once in ctc_loss
function internally. I have tried the example and it works. Should I follow the example or the Tensorflow documentation?
Upvotes: 10
Views: 5731
Reputation: 723
The loss used in the code you posted is different from the one you linked. The loss used in the code is found here
The keras code peforms some pre-processing before calling the ctc_loss
that makes it suitable for the format required. On top of requiring the input to be not softmax-ed, tensorflow's ctc_loss
also expects the dims to be NUM_TIME, BATCHSIZE, FEATURES
. Keras's ctc_batch_cost
does both of these things in this line.
It does log() which gets rid of the softmax scaling and it also shuffles the dims so that its in the right shape. When I say gets rid of softmax scaling, it obviously does not restore the original tensor, but rather softmax(log(softmax(x))) = softmax(x)
. See below:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
x = [1,2,3]
y = softmax(x)
z = np.log(y) # z =/= x (obviously) BUT
yp = softmax(z) # yp = y #####
Upvotes: 8