Adam
Adam

Reputation: 109

How to determine if an image needs to be rotated

I am trying to find a way to determine whether an image needs to be rotated in order for the text to be horizontally aligned. And if it does need to be rotated then by how many degrees?

I am sending the images to tesseract and for tesseract to be effective, the text in the images needs to be horizontally aligned.

I'm looking for a way do this without depending on the "Orientation" metadata in the image.

I've thought of following ways to do this:

  1. Rotate the image 90 degrees clockwise four times and send all four images to tesseract. This isn't ideal because of the need to process one image 4 times.
  2. Use hough line transform to see if the lines are vertical or horizontal. If they are vertical then rotate the image. This way the image still might need to be rotated 180 degrees. So I'm unsure how effective this would be.

I'm wondering if there are other ways to accomplish this using OpenCV, imageMagik or any other image processing techniques.

Upvotes: 4

Views: 3305

Answers (3)

Eric Ihli
Eric Ihli

Reputation: 1907

You can figure this out in a terminal with tesseract's psm option.

tesseract --psm 0 "infile" "outfile" will create outfile.osd which contains the info:

Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 27.93
Script: Latin
Script confidence: 6.55

man tesseract

...       
--psm N
           Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:

               0 = Orientation and script detection (OSD) only.
               1 = Automatic page segmentation with OSD.
               2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
...

Upvotes: 1

HugoRune
HugoRune

Reputation: 13799

Aytempting ocr on all 4 orientations seems like a reasonable choice, and I doubt you will find a more reliable heuristic.

If speed is an issue, you could OCR a small part of the image first. Select a rectangular region, that has the proper amount of edge pixels and white/black ratio for text, then send that to tesseract in different orientations. With a small region, you could even try smaller steps than 90°, or combine it with another heuristic like Hough.

If you remember the most likely orientation based on previous images, and stop once an orientation is successfully processed by tesseract, you probably do not even have to try most orientations in most cases.

Upvotes: 1

Bharat
Bharat

Reputation: 2179

If you have a 1000 images which say horizontal or vertical, you can resize these images to 224x224 and then fine-tune a Convolutional neural network, like AlexNet or VGG for this task. If you want to know how many right rotations to make for the image, you can set the labels as the number of clock-wise rotations, like 0,1,2,3.

http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

Upvotes: 2

Related Questions