Reputation: 17
I am struggling to get the text from the image where the text is bold. I have attached the image here.
I have inverted the color of the image using OpenCV and changed it to
I want the tesseract to give 5 as text output, but I get an empty value.
Image with text in multiple lines. The data from this image is not been extracted using psm 7,8 or 9.
Upvotes: 1
Views: 3662
Reputation: 8626
Both images can be recognized with psm
set to 7
, 8
, or 9
. Would suggest to use Tesseract 4.0.0 alpha for improved OCR result if you are using 3.x.x, and use --psm 9
.
Page segmentation mode:
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
Hope this help.
EDIT:
Regarding your additional question on on-the-fly to identify which psm to use, you may check the image height to determine the psm
value to be used.
For example, the height of the 5
one is 80 and the fox
message is 480. With the pixel value, it's easier to implement a code to set the psm
value.
Upvotes: 2