Reputation: 109
I am trying to find a way to determine whether an image needs to be rotated in order for the text to be horizontally aligned. And if it does need to be rotated then by how many degrees?
I am sending the images to tesseract and for tesseract to be effective, the text in the images needs to be horizontally aligned.
I'm looking for a way do this without depending on the "Orientation" metadata in the image.
I've thought of following ways to do this:
I'm wondering if there are other ways to accomplish this using OpenCV, imageMagik or any other image processing techniques.
Upvotes: 4
Views: 3305
Reputation: 1907
You can figure this out in a terminal with tesseract's psm
option.
tesseract --psm 0 "infile" "outfile"
will create outfile.osd
which contains the info:
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 27.93
Script: Latin
Script confidence: 6.55
man tesseract
...
--psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
...
Upvotes: 1
Reputation: 13799
Aytempting ocr on all 4 orientations seems like a reasonable choice, and I doubt you will find a more reliable heuristic.
If speed is an issue, you could OCR a small part of the image first. Select a rectangular region, that has the proper amount of edge pixels and white/black ratio for text, then send that to tesseract in different orientations. With a small region, you could even try smaller steps than 90°, or combine it with another heuristic like Hough.
If you remember the most likely orientation based on previous images, and stop once an orientation is successfully processed by tesseract, you probably do not even have to try most orientations in most cases.
Upvotes: 1
Reputation: 2179
If you have a 1000 images which say horizontal or vertical, you can resize these images to 224x224 and then fine-tune a Convolutional neural network, like AlexNet or VGG for this task. If you want to know how many right rotations to make for the image, you can set the labels as the number of clock-wise rotations, like 0,1,2,3.
http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
Upvotes: 2