skiwi
skiwi

Reputation: 69379

Tesseract hOCR: How to detect upside down?

(I'll answer my own question here for general knowledge)

In Tesseract OCR, how do you detect an image that is upside down?
People who have worked with Tesseract may, or may not, know that Tesseract can read images that are being presented upside down.
The issue however is in that you do not know that it is upside down if you use hOCR output, as nowhere in the document it is said.

So how to detect it?

Upvotes: 1

Views: 1709

Answers (1)

skiwi
skiwi

Reputation: 69379

After double checking, I noticed that it really is not directly in the hOCR output, I would expect some attribute in the ocr_page div denoting the orientation.

What I do have figured out is that you can read the y-values of the bounding box of all ocr_careas per page:

  • If the values go from low to high, then the page is in normal orientation.
  • If the values go from high to low, then the page is upside down.

This may or may not work for 90 and 270 degrees rotation, but it could very well be that you see a similar pattern for the x-value.

Upvotes: -1

Related Questions