kant01
kant01

Reputation: 17

Tesseract to do OCR on images with bold text

I am struggling to get the text from the image where the text is bold. I have attached the image here.Original Image

I have inverted the color of the image using OpenCV and changed it to Inverted color

I want the tesseract to give 5 as text output, but I get an empty value.

Image with text in multiple lines. The data from this image is not been extracted using psm 7,8 or 9. Multiline text image

Upvotes: 1

Views: 3662

Answers (1)

thewaywewere
thewaywewere

Reputation: 8626

Both images can be recognized with psm set to 7, 8, or 9. Would suggest to use Tesseract 4.0.0 alpha for improved OCR result if you are using 3.x.x, and use --psm 9.

Page segmentation mode:
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.

Hope this help.

EDIT:

Regarding your additional question on on-the-fly to identify which psm to use, you may check the image height to determine the psm value to be used.

For example, the height of the 5 one is 80 and the fox message is 480. With the pixel value, it's easier to implement a code to set the psm value.

Upvotes: 2

Related Questions