Reputation: 1200
Did anybody see any samples how set up simple application to train dnet and then use it to recognize it a limited number of voice commands without binding to a particular language? I believe Kaldi API is quite powerful for it but there is a lack of documentation.
Upvotes: 0
Views: 510
Reputation: 25220
1) You take existing DNN model or train it yourself. You can use Tedlium experiment from Kaldi, it is free to run. It does not matter if model is for English, it will work for other languages too.
2) You extract DNN posteriors from both training keyphrases. nnet3-am-compute tool can be used for that. It takes DNN model and returns phonetic or state posteriors for every frame.
3) You implement DTW algorithm to compare DNN posteriors. This part you have to do yourself, it is not implemented in Kaldi.
Related papers describing the algorithm:
Query-By-Example Spoken Term Detection Using Phonetic Posteriorgram Templates
Upvotes: 0