T Vlad
T Vlad

Reputation: 63

Tesseract confuses "-" and "7" in a single-line image

This image is recognized as
08787365076858, instead of
0878-3650-6858

I have a list of 50 similar image files, and in each all "-" chars are matched as "7".

Default settings were used, even with installing tesseract to clear system. Also tried to use -psm=7/8 (single line/word) and set whitelist characters.

What can be the reason of this issue and how can I overcome it? I know about training, but it's interesting, why accurate (in most cases) tesseract confuses so different chars.

Upvotes: 0

Views: 641

Answers (1)

nguyenq
nguyenq

Reputation: 8345

Rescaling to 300DPI would help get those dashes in the image.

Upvotes: 2

Related Questions