pythonopencvimage-processingcomputer-visionocr

Reputation: 267

Python OpenCV skew correction for OCR

Currently, I am working on an OCR project where I need to read the text off of a label (see example images below). I am running into issues with the image skew and I need help fixing the image skew so the text is horizontal and not at an angle. Currently the process I am using attempts to score different angles from a given range (code included below), but this method is inconsistent and sometimes overcorrects an image skew or flat out fails to identify the skew and correct it. Just as a note, before the skew correction I am rotating all of the images by 270 degrees to get the text upright, then I am passing the image through the code below. The image passed through to the function is already a binary image.

Code:


def findScore(img, angle):
    """
    Generates a score for the binary image recieved dependent on the determined angle.\n
    Vars:\n
    - array <- numpy array of the label\n
    - angle <- predicted angle at which the image is rotated by\n
    Returns:\n
    - histogram of the image
    - score of potential angle
    """
    data = inter.rotate(img, angle, reshape = False, order = 0)
    hist = np.sum(data, axis = 1)
    score = np.sum((hist[1:] - hist[:-1]) ** 2)
    return hist, score

def skewCorrect(img):
    """
    Takes in a nparray and determines the skew angle of the text, then corrects the skew and returns the corrected image.\n
    Vars:\n
    - img <- numpy array of the label\n
    Returns:\n
    - Corrected image as a numpy array\n
    """
    #Crops down the skewImg to determine the skew angle
    img = cv2.resize(img, (0, 0), fx = 0.75, fy = 0.75)

    delta = 1
    limit = 45
    angles = np.arange(-limit, limit+delta, delta)
    scores = []
    for angle in angles:
        hist, score = findScore(img, angle)
        scores.append(score)
    bestScore = max(scores)
    bestAngle = angles[scores.index(bestScore)]
    rotated = inter.rotate(img, bestAngle, reshape = False, order = 0)
    print("[INFO] angle: {:.3f}".format(bestAngle))
    #cv2.imshow("Original", img)
    #cv2.imshow("Rotated", rotated)
    #cv2.waitKey(0)
    
    #Return img
    return rotated

Example images of the label before correction and after

Before correction -> After correction

If anyone can help me figure this problem out, it would be of much help.

Upvotes: 13

Answers (4)

Vladislav Ershov

Reputation: 11

To add up to @full_pr0 answer, you can speed up calculations by 4-5 times by stacking images:

import cv2
import numpy as np

def rotate_image(image, angle):
    (h, w) = image.shape[: 2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    corrected = cv2.warpAffine(image, M, (w, h), flags = cv2.INTER_CUBIC, \
        borderMode = cv2.BORDER_REPLICATE)
    return corrected

def determine_score(arr):
     histogram = np.sum(arr, axis = 2, dtype = float)
     score = np.sum((histogram[..., 1 :] - histogram[..., : -1]) ** 2, \
        axis = 1, dtype = float)
     return score

def correct_skew(image, delta = 0.1, limit = 5):
     thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + \ 
        cv2.THRESH_OTSU)[1]
     angles = np.arange(-limit, limit + delta, delta)
     img_stack = np.stack([rotate_image(thresh, angle) for angle \ 
        in angles], axis = 0)
     scores = determine_score(img_stack)
     best_angle = angles[np.argmax(scores)]
     corrected = rotate_image(image, best_angle)
     return best_angle, corrected

img_path = 'test.jpg'
img = cv2.imread(img_path, 0)
angle, corrected = correct_skew(img)

Upvotes: 1

nathancy

Reputation: 46600

Here's an implementation of the Projection Profile Method algorithm for skew angle estimation. Various angle points are projected into an accumulator array where the skew angle can be defined as the angle of projection within a search interval that maximizes alignment. The idea is to rotate the image at various angles and generate a histogram of pixels for each iteration. To determine the skew angle, we compare the maximum difference between peaks and using this skew angle, rotate the image to correct the skew.

Original -> Corrected

Skew angle: -2

import cv2
import numpy as np
from scipy.ndimage import interpolation as inter

def correct_skew(image, delta=1, limit=5):
    def determine_score(arr, angle):
        data = inter.rotate(arr, angle, reshape=False, order=0)
        histogram = np.sum(data, axis=1, dtype=float)
        score = np.sum((histogram[1:] - histogram[:-1]) ** 2, dtype=float)
        return histogram, score

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] 

    scores = []
    angles = np.arange(-limit, limit + delta, delta)
    for angle in angles:
        histogram, score = determine_score(thresh, angle)
        scores.append(score)

    best_angle = angles[scores.index(max(scores))]

    (h, w) = image.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
    corrected = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
            borderMode=cv2.BORDER_REPLICATE)

    return best_angle, corrected

if __name__ == '__main__':
    image = cv2.imread('1.png')
    angle, corrected = correct_skew(image)
    print('Skew angle:', angle)
    cv2.imshow('corrected', corrected)
    cv2.waitKey()

Note: You may have to adjust the delta or limit values depending on the image. The delta value controls iteration step, it will iterate up until the limit which controls the maximum angle. This method is straightforward by iteratively checking each angle + delta and currently only works to correct skew in the range of +/- 5 degrees. If you need to correct at a larger angle, adjust the limit value. For another approach to handle skew, take a look at this alternative method.

Upvotes: 26

full_pr0

Reputation: 51

To add up to @nathancy answer, for windows users, if you're getting additional skew just add dtype=float. Whenever you create a numpy array. There's a integer overflow issue with windows as it assigns int(32) bit as data type unlike rest of the systems.

See below code; added dtype=float in np.sum() methods:

import cv2
import numpy as np
from scipy.ndimage import interpolation as inter

def correct_skew(image, delta=1, limit=5):
    def determine_score(arr, angle):
        data = inter.rotate(arr, angle, reshape=False, order=0)
        histogram = np.sum(data, axis=1, dtype=float)
        score = np.sum((histogram[1:] - histogram[:-1]) ** 2, dtype=float)
        return histogram, score

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] 

    scores = []
    angles = np.arange(-limit, limit + delta, delta)
    for angle in angles:
        histogram, score = determine_score(thresh, angle)
        scores.append(score)

    best_angle = angles[scores.index(max(scores))]

    (h, w) = image.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
    rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
          borderMode=cv2.BORDER_REPLICATE)

    return best_angle, rotated

if __name__ == '__main__':
    image = cv2.imread('1.png')
    angle, rotated = correct_skew(image)
    print(angle)
    cv2.imshow('rotated', rotated)
    cv2.imwrite('rotated.png', rotated)
    cv2.waitKey()

Upvotes: 5

Gene M

Reputation: 1246

ASSUMPTIONS:

The content in your input image is not tilted by more than 45 degrees in either direction
All of the content fits relatively well into one rectangular shape
You've already applied the thresholding, and then possibly either erosion or clustering algorithms to get rid of the noise

SOLUTION:

hgt_rot_angle = cv2.minAreaRect(your_CLEAN_image_pixel_coordinates_to_enclose)[-1]
com_rot_angle = hgt_rot_angle + 90 if hgt_rot_angle < -45 else hgt_rot_angle

(h, w) = my_input_image.shape[0:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, com_rot_angle, 1.0)
corrected_image = cv2.warpAffine(your_ORIGINAL_image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)

ORIGINAL SOURCE:

https://www.pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/ - a GREAT tutorial to get started (kudos to Adrian Rosebrock), BUT:

It operates on clean synthesized images of text and does not have the noise reduction steps in it or even references to them, only the thresholding... In most real-world scenarios, however, the images that need the rotation performed before OCR also need significant noise reduction performed. I have tried the OpenCV erosion operations and the scikit-learn DBSCAN clustering algorithm to pass only the "core" pixels to the above solution, and they both worked reasonably well.
I think that the explanation of how to interpret the angle value returned by cv2.minAreaRect() is not quite clear there and the code has the same variable for detection and for correction, which is even more confusing. I used the separate variables for clarity and my explanation of the first two lines of code is below.
I must respectfully disagree that we need to "take the inverse" of the detected angle of rotation (lines 38 and 43 in the tutorial) before passing the value to the cv2.getRotationMatrix2D() function, based on OpenCV documentation and based on my testing. More on this below as well.

SOLUTION EXPLANATION:

The cv2.minAreaRect() function returns the rotation angle value in the [-90, 0] range as the last element of the tuple returned, and the angle value is tied to the HEIGHT value in the same returned tuple (it's located at cv2.minAreaRect()[1][1], to be precise, but we're not using it here).

Unless the angle of rotation is either -90.0 or 0.0, the decision of what dimension is chosen as the "height" is not arbitrary - it always has to go from upper left to lower right, i.e. to have a negative slope.

What this means for our use case is that, depending on the width-height proportion of the content block and on its tilt, the "height" value returned by cv2.minAreaRect() can be either the content block's logical height OR the width.

This means 2 things for us:

We can't fix a tilt of over 45 degrees to either side without making assumptions about the "proper" aspect ratio.
Without the assumptions about the content block's aspect ratio we HAVE TO MAKE THE ASSUMPTION that the content is tilted by less than 45 degrees to either side, just in order to proceed. This assumption works very well for the scans where only the portrait orientation was intended, but breaks for the documents with just one page out of many scanned using the lanscape orientation. I have not tackled this problem yet.

So, given (1) no assumptions about the content block's aspect ratio and (2) the assumed [-45:45] range of the tilt, we can get the common tilt of the height and the width relative to the rectangular coordinate system (in the [-45:45] range) by simply adding 90 degrees to the rotation value of the "height" if it falls below -45.0.

Once we get this detected and calculated "common rotation angle" value, we can use it to fix the tilt by just passing the value directly to the cv2.getRotationMatrix2D() function.
NOTE: the calculated existing "common rotation angle" is negative for the counter-clockwise tilt and positive for the clockwise tilt, which is a very common everyday convention. However, if we think of the angle argument of cv2.getRotationMatrix2D() as "the correction angle to apply" (which, I think, was the intent), then the sign convenion is the OPPOSITE. So we need to pass the detected and calculated "common rotation angle" value as-is if we want to see it counter-acted in the output image, which is supported by the many tests that I have performed.
This is a direct quote on the angle parameter from OpenCV documentation:

Rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner).

WHAT IF THE SINGLE RECTANGLE IS A POOR FIT?

The above solution works very well for densely populated full page scans, clean labels and things like that, but it does not work well at all for sparsely populated images, where the overall tightest fit is not a rectangle, i.e. when the 2nd starting assumption does not hold.

In the latter scenario the following may work IF most of the individual shapes in the input image can nicely fit into rectangles, or at least better than all of the content combined:

Applying the thresholding / grading / morphing / erosion operations and, finally, the countouring in order to locate and to outline the areas of the image that are likely to contain relevant content and not noise.
Getting the MAR (min area rectangle) for each contour and the rotation angle for each corresponding MAR.
Aggregating the results to arrive at the most probable overall tilt angle that needs to be fixed (the exact methods here are many).

OTHER SOURCES:

https://www.pyimagesearch.com/2015/11/30/detecting-machine-readable-zones-in-passport-images/

https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html

Upvotes: 3

Python OpenCV skew correction for OCR

Answers (4)

Related Questions