R.Kulkarni
R.Kulkarni

Reputation: 11

"failed to load any lstm-specific dictionaries for lang " tesseract 4.1

I tried to train the tesseract 4.1 using OCRD project but after training completed I copied the lang.traineddata but getting above error. The tesseractWiki page is very confusing to understand asking to use combine_lang_model after making lstmf file. So Actually I have the lstmf file. I created these file by using tif/box pair. Please help me for further step.

Upvotes: 0

Views: 1459

Answers (1)

livezingy
livezingy

Reputation: 181

Related discussions:Failed to load any lstm-specific dictionaries for lang xxx

Suppose your training folder like this:

OCRD/makefile
OCRD/data/foo-ground-truth.

You could try as following steps:

  1. Find the WORDLIST_FILE/NUMBERS_FILE/PUNC_FILE in the makefile, and change them to:

    WORDLIST_FILE := data/$(MODEL_NAME).wordlist NUMBERS_FILE := data/$(MODEL_NAME).numbers PUNC_FILE := data/$(MODEL_NAME).punc

  2. Suppose your base traineddata is eng.traineddata.

2.1 Download the .wordlist/.numbers/.punc files from the langdata_lstm.

2.2 Place them in OCRD/data

2.3 if the MODEL_NAME = foo, rename them as: foo.wordlist, foo.numbers, foo.punc

if you don't have the base traineddata, you could try this too. But if your base traineddata is afr, you should download the files from langdata_lstm/afr.

  1. make training again

The cause of this error: In OCRD, the default path of the above three files is $ (OUTPUT_DIR) = data / $ (MODEL_NAME), and all files in this path are automatically generated during the training process.

If the variable START_MODEL is not assigned, the makefile will not generate any related files under this path;

If the variable START_MODEL has been assigned, the foo.lstm-number-dawg、foo.lstm-punc-dawg、foo.lstm-word-dawg and so on will be produced in data / $ (MODEL_NAME). But they are not the right one. So there may be a bug in OCRD.

Upvotes: 1

Related Questions