Reputation: 1547
I want to do OCR on this image. This is pre-define format. ie first five will characters, then next four will be digits and last will be character.
When I execute following command
$ tesseract in.png stdout
I get output as BDVPD474SQ
So, I went for user-pattern. I created a file(in directory /usr/share/tesseract-ocr/tessdata/configs) named as bazaar (its content is as follow)
load_system_dawg F
load_freq_dawg F
user_patterns_suffix user-patterns
I also created a file, named as eng.user-patterns in directory /usr/share/tesseract-ocr/tessdata (its content is as follow)
\A\A\A\A\A\d\d\d\d\A
Still, I am getting same result
$ tesseract in.png stdout bazaar
BDVPD474SQ
What I am doing wrong ? Has anyone accomplished this by Tess4j ?
Upvotes: 10
Views: 2718
Reputation: 427
You can add the option --oem 0
to ensure user patterns apply. See this PR comment.
Since I am on tesseract 5.3.3
, I had to tweak your input image to reproduce a similar behavior:
I specify the user pattern \A\A\A\A\A\d\d\d\A\A
, to force recognition of the partially erased 9
character as a letter.
With --oem 0
, Tesseract returns BDVPD474SQ
(it reads an S
).
Without the option, Tesseract returns BDVPD474SQ
(identifying a 5
).
Upvotes: 0