Sivaram Rasathurai
Sivaram Rasathurai

Reputation: 6333

Does Tesseract do image resizing internally?

OpenCv doesn't read the metadata of the image. So that, we can't get the dpi of an image. When someone asks about dpi related ocr questions in stackoverflow,

Most of the answers said we don't need DPI. We only need a pixel size.

Changing image DPI for usage with tesseract

Change dpi of an image in OpenCV

In some places, where no one asks about dpi and needs to improve the OCR accuracy someone's come up with the idea that setup DPI to 300 will improve the accuracy.

Tesseract OCR How do I improve result?

Best way to recognize characters in screenshot?

One more thing is, Tesseract said on their official page about that

Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images.

After some google search, I have found the following things.

  1. We can't tell the image resolution based on height and width
  2. We want an image resolution is high enough to support accurate OCR.
  3. Font size typically means unit length and not pixels like if we have 72 points we have one inch. font size 12pt means 1/6 inchs.
  4. When we have 300 ppi image with a 12pt fontsize texts then the text pixel size is 300 1/6 = 50 pixels. If we have 60 ppi then the text pixel size is 601/6 =10 pixels.

Below quoted one is from the tesseract official page. Is there a Minimum / Maximum Text Size? (It won’t read screen text!)

There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be “noise removed”.

Using LSTM there seems also to be a maximum x-height somewhere around 30 px. Above that, Tesseract doesn’t produce accurate results. The legacy engine seems to be less prone to this (see https://groups.google.com/forum/#!msg/tesseract-ocr/Wdh_JJwnw94/24JHDYQbBQAJ).

From these things, I come to one solution that is, We need a 10 to 12 pt font size text for the OCR. which means If we have 120 ppi(pixel per inch) then we need a height of 20-pixel size. if we have 300 ppi then we need a 50-pixel height for the text.


  1. If Opencv doesn't read the dpi information. What is the default dpi value to tesseract input from an image which is got by imread method of OpenCV?

  2. Does Tesseract do image resizing based on the dpi of an image internally?

  3. If I do resizing the image using opencv then i need to set the dpi to 300 dpi if resizing happens based on dpi internally. What is the easiest way to set up the DPI in OpenCV + pytesseract? but we can do this with PIL

Upvotes: 5

Views: 5805

Answers (1)

rinogo
rinogo

Reputation: 9153

To answer your questions:

  1. DPI is only really relevant when scanning documents - it's a measure of how many dots per inch are used to represent the scanned image. Once tesseract is processing images, it only cares about pixels.

  2. Not as far as I can tell.

  3. The SO answer you linked to relates to writing an image, not reading an image.

I think I understand the core of what you're trying to get at. You're trying to improve the accuracy of your results as it relates to font/text size.

Generally speaking, tesseract seems to work best on text that is about 32 px tall.

Manual resizing

If you're working on a small set of images or a consistent group of images, you can manually resize those images to have capital letters that are approximately 32 pixels tall. That should theoretically give the best results in tesseract.

Automatic resizing

I'm working with an inconsistent data set, so I need an automated approach to resizing images. What I do is to find the bounding boxes for text within the image (using tesseract itself, but you could use EAST or something similar).

Then, I calculate the median height of these bounding boxes. Using that, I can calculate how much I need to resize the image so that the median height of a capital letter in the image is ~32 px tall.

Once I've resized the image, I rerun tesseract and hope for the best. Yay!

Hope that helps somewhat! :)


Bonus: I shared my source code for this function in this Gist

Upvotes: 5

Related Questions