Reputation: 1307
I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number per file) I will be using mfcc as features for the network.
Further, I would like to know the difference in the dataset if I am going to use a library that support CTC (Connectionist Temporal Classification)
Upvotes: 5
Views: 3411
Reputation: 318
You can use the answer/guidance provided here
Depending on what library you are using to create your LSTM(pybrain, theano, keras), you can look through their documentation.
I would recommend using Theano(Binary LSTM link) or Keras(Tutorial) for this because they are fairly simple to understand and are well documented.
hope this helps.
Upvotes: 4