Shamanu4
Shamanu4

Reputation: 5406

Voice recognition with Julius. How to make .voca file?

I'm making a voice recognition system and Julius shows not bad results in this work. Words from sample .voca file are recognizing perfectly but how to place own words and transcriptions to the file?

I've tried VoxForge (http://www.voxforge.org/) last release and nightly builds for acoustic models with their vocabulary but I've got a lot a lot errors at julius start like this:

Error: voca_load_htkdict: line 19: triphone "r-d+v" not found
Error: voca_load_htkdict: line 19: triphone "d-v+aa" not found
Error: voca_load_htkdict: the line content was: 2   [AARDVARK]  aa r d v aa r k
Error: voca_load_htkdict: begin missing phones
Error: voca_load_htkdict: r-d+v
Error: voca_load_htkdict: d-v+aa
Error: voca_load_htkdict: end missing phones
Error: init_voca: error in reading /usr/src/custom/julius/quickstart/grammar/sample.dict
ERROR: failed to read dictionary "/usr/src/custom/julius/quickstart/grammar/sample.dict"
ERROR: m_fusion: some error occured in reading grammars
ERROR: Error in loading model

Anyone knows the rules of word transcription for .voca files?

Upvotes: 0

Views: 2596

Answers (1)

Muhammad Dorgham
Muhammad Dorgham

Reputation: 36

error reason: julius optput these messages when your word dictionary contains words that are not trained in the Acoustic Model because the "voca_load_htkdict.c" tries to match the triphones in dict file with the triphone list in Acoustic Model, so when it does not find it, it shows this error and stops the program.

possible error solutions: 1. enable -forcedict option or uncomment it jconf file to Skip error words in dictionary and force running. or.. 2. map the "not found triphone" to the most close physical triphone in hmmlist file "tiedlist". for example: b-ey+t v-eh+t The first column is the name of triphone (generated from your dictionary), and the second column is the name of the HMM actually defined in your AM.

but this solution can be done if the "not found triphones" are little, not too many.

  1. the best solution is to not to include words in your dict file that are not in the A.M note that the first two solutions are for testing julius only because for production or comercial projects you must train the acoustic model and language model with the same corpus.

Upvotes: 2

Related Questions