Reputation: 11
I tried to train the tesseract 4.1 using OCRD project but after training completed I copied the lang.traineddata but getting above error. The tesseractWiki page is very confusing to understand asking to use combine_lang_model after making lstmf file. So Actually I have the lstmf file. I created these file by using tif/box pair. Please help me for further step.
Upvotes: 0
Views: 1459
Reputation: 181
Related discussions:Failed to load any lstm-specific dictionaries for lang xxx
Suppose your training folder like this:
OCRD/makefile
OCRD/data/foo-ground-truth.
You could try as following steps:
Find the WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE in the makefile, and change them to:
WORDLIST_FILE := data/$(MODEL_NAME).wordlist NUMBERS_FILE := data/$(MODEL_NAME).numbers PUNC_FILE := data/$(MODEL_NAME).punc
Suppose your base traineddata is eng.traineddata.
2.1 Download the .wordlist/.numbers/.punc files from the langdata_lstm.
2.2 Place them in OCRD/data
2.3 if the MODEL_NAME = foo, rename them as: foo.wordlist, foo.numbers, foo.punc
if you don't have the base traineddata, you could try this too. But if your base traineddata is afr, you should download the files from langdata_lstm/afr.
The cause of this error: In OCRD, the default path of the above three files is $ (OUTPUT_DIR) = data / $ (MODEL_NAME), and all files in this path are automatically generated during the training process.
If the variable START_MODEL is not assigned, the makefile will not generate any related files under this path;
If the variable START_MODEL has been assigned, the foo.lstm-number-dawg、foo.lstm-punc-dawg、foo.lstm-word-dawg and so on will be produced in data / $ (MODEL_NAME). But they are not the right one. So there may be a bug in OCRD.
Upvotes: 1