Reputation: 11
Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.
I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).
Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].
I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).
So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?
Upvotes: 1
Views: 3273
Reputation: 21
The fully connected layer should be applied per time step. It's like applying same dense layer per time step in recurrent neural network. For output of convolution layer, time step is width.
So for example, output shape would be:
It is expected shape for ctc_loss in tensorflow.
Upvotes: 2