Reputation: 2572
I'm seeking advice on which version of Tesseract should I use to train for an ancient language that has unique letters. The language is very similar to Arabic in terms of characteristics. It also goes from right-to-left and some letter can connect in the word. In other words, a letter might have three shapes depending if it comes in the beginning, middle or end. It also has harakat (short vowel marks) that come above or below letters.
The reason I'm asking is because I want to take advantage of the tools available for version 3.X but this warning about Arabic threw me off since this language is very similar to it.
For anyone who's familiar with Tesseract, which version do you recommend to train for such a language? Also, if you are aware of a better tool, kindly share it please.
Upvotes: 0
Views: 727
Reputation: 8626
If you have a large amount of documents need to OCR, would recommend to use Tesseract 4.0 as it's faster in general. You may refer to below for more information in case you haven't read that before.
--oem 1
) which is Neural nets LSTM only.Tesseract 4.0.0 alpha has been released since last Nov/Dec.
Hope this help.
Upvotes: 2