Reputation: 69379
(I'll answer my own question here for general knowledge)
In Tesseract OCR, how do you detect an image that is upside down?
People who have worked with Tesseract may, or may not, know that Tesseract can read images that are being presented upside down.
The issue however is in that you do not know that it is upside down if you use hOCR output, as nowhere in the document it is said.
So how to detect it?
Upvotes: 1
Views: 1709
Reputation: 69379
After double checking, I noticed that it really is not directly in the hOCR output, I would expect some attribute in the ocr_page
div
denoting the orientation.
What I do have figured out is that you can read the y-values of the bounding box of all ocr_carea
s per page:
This may or may not work for 90 and 270 degrees rotation, but it could very well be that you see a similar pattern for the x-value.
Upvotes: -1