Tesseract OCR ignores the lines contain the asterisk

Question

I have the image created from old fax document (the font is specific) Generally Tesseract works pretty ok with this input, except one use case. When the line starts with many leading asterisk '*' than it is ignored.

The result produces by ocr is different for given psm

psm 1: the empty page
psm 6: For queries please contact NA, KRKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KK KK

In every use case the tittle "comment" is skipped

But when I manually in Paint removed the all '*' from image then the ocr works fine. I ve no idea how to process the ocr without image preprocessing. Can someone understand it?

Tesseract OCR ignores the lines contain the asterisk

Answers (1)

Related Questions