How to use Deespeech (v0.5.1) effectively and use of language models during training and inference?

Question

I am trying to train and use a model using Deepspeech v0.5.1 for English. My aim to train two models, one with and without a language model. Request your help on several fronts please. Sorry this is long but trying be as detailed as possible; and also, being new to Linux and data-science I may be stating some very obvious things. Thank you in advance for your help. Since SO said the original form was spam, I am posting and answering this question with further background information. Regards, Rohit

B) My questions:

B1) When using a language model either for training or inference, do I HAVE to specify the lm_binary parameter AND the corresponding trie file? Can using only the trie work?

B2) Irrespective of whether a language model was used while training the model (binaryFile and trie together), later when the model is used for inference, can I choose to use OR not use a language model? Can a different language model be used later or only the one used for training? Are there things to note while choosing an alternative model? E.g. training using a 3-gram model but using a 4-gram model during inference? Is there anything else like this you can think of?

B3) Suppose my model is already built by training on a vocabulary file, arpa, trie and lm_binary built from only 10k data points. Say I create a new vocabulary called BigVocabulary.file from a larger corpus than the one used for training. E.g. the entire 629731 data points in validated.tsv file; use bigger vocabulary to create the .arpa, lmBinary and trie files. I ensure that the valid characters are exactly the same by comparing the alphabet files. Then on the model trained with smaller vocabulary, can I use BigVocabulary.binary.file and BigVocabulary.trie while doing inference using the command?

I already created a model with only first 1000 files and inference is poor but works. Command:

deepspeech \ --model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet-Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.

Console output:

(dpsp5v051basic) rohit@DE-W-0246802:~/dpspCODE/v051/DeepSpeech$ deepspeech \ --model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet-Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav Loading model from file /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb TensorFlow: v1.13.1-10-g3e0cc53 DeepSpeech: v0.5.1-0-g4b29b78 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2019-08-01 16:11:02.155443: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-08-01 16:11:02.179690: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant 2019-08-01 16:11:02.179740: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant 2019-08-01 16:11:02.179756: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant 2019-08-01 16:11:02.179891: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant Loaded model in 0.0283s. Loading language model from files /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/lm/lm4gram/vocabulary-Set5First1050_4gram.klm /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/trie/trie4gram/Set5First1050_4gram.trie Loaded language model in 0.068s. Running inference. a on a in a is the Inference took 0.449s for 3.041s audio file.

But if i use the BigVocabulary.trie and lmBinary files then I get an error saying the trie file version mismatches and to update the trie file.

But it still seems to load the language model. So did Deepspeech actually pick it up and apply it correctly? How do I fix this error?

Command:

deepspeech \ --model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet-Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav

Console output:

(dpsp5v051basic) rohit@DE-W-0246802:~/dpspCODE/v051/DeepSpeech$ deepspeech \

--model /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb \ --alphabet /home/rohit/dpspTraining/data/wavFiles/commVoiceSet5-1kTotal/alphabetDir/alphabet-Set5First1050.txt \ --lm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm \ --trie /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie \ --audio /home/rohit/dpspTraining/data/wavFiles/wav33/test/File28.wav Loading model from file /home/rohit/dpspTraining/models/v051/model8-validFirst1k-yesLM-4gram/savedModel/output_graph.pb TensorFlow: v1.13.1-10-g3e0cc53 DeepSpeech: v0.5.1-0-g4b29b78 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2019-08-01 16:11:58.305524: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-08-01 16:11:58.322902: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant 2019-08-01 16:11:58.322945: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant 2019-08-01 16:11:58.322956: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant 2019-08-01 16:11:58.323063: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant Loaded model in 0.0199s. Loading language model from files /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/lm/lm4gram/vocabulary-allValidated_o4gram.klm /home/rohit/dpspTraining/data/wavFiles/testVocabAllValidated/trie/trie4gram/allValidated_o4gram.trie Error: Trie file version mismatch (4 instead of expected 3). Update your trie file. Loaded language model in 0.00368s. Running inference. an on o tn o as te tee Inference took 1.893s for 3.041s audio file.

Thank you for your time.

How to use Deespeech (v0.5.1) effectively and use of language models during training and inference?

Answers (1)

Related Questions