How to prepare a dataset for speech recognition

Question

I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number per file) I will be using mfcc as features for the network.

Further, I would like to know the difference in the dataset if I am going to use a library that support CTC (Connectionist Temporal Classification)

Nirbhay Tandon · Accepted Answer

You can use the answer/guidance provided here

Depending on what library you are using to create your LSTM(pybrain, theano, keras), you can look through their documentation.

I would recommend using Theano(Binary LSTM link) or Keras(Tutorial) for this because they are fairly simple to understand and are well documented.

hope this helps.

How to prepare a dataset for speech recognition

Answers (1)

Related Questions