Reputation: 267
Currently, I am working on an OCR project where I need to read the text off of a label (see example images below). I am running into issues with the image skew and I need help fixing the image skew so the text is horizontal and not at an angle. Currently the process I am using attempts to score different angles from a given range (code included below), but this method is inconsistent and sometimes overcorrects an image skew or flat out fails to identify the skew and correct it. Just as a note, before the skew correction I am rotating all of the images by 270 degrees to get the text upright, then I am passing the image through the code below. The image passed through to the function is already a binary image.
Code:
def findScore(img, angle):
"""
Generates a score for the binary image recieved dependent on the determined angle.\n
Vars:\n
- array <- numpy array of the label\n
- angle <- predicted angle at which the image is rotated by\n
Returns:\n
- histogram of the image
- score of potential angle
"""
data = inter.rotate(img, angle, reshape = False, order = 0)
hist = np.sum(data, axis = 1)
score = np.sum((hist[1:] - hist[:-1]) ** 2)
return hist, score
def skewCorrect(img):
"""
Takes in a nparray and determines the skew angle of the text, then corrects the skew and returns the corrected image.\n
Vars:\n
- img <- numpy array of the label\n
Returns:\n
- Corrected image as a numpy array\n
"""
#Crops down the skewImg to determine the skew angle
img = cv2.resize(img, (0, 0), fx = 0.75, fy = 0.75)
delta = 1
limit = 45
angles = np.arange(-limit, limit+delta, delta)
scores = []
for angle in angles:
hist, score = findScore(img, angle)
scores.append(score)
bestScore = max(scores)
bestAngle = angles[scores.index(bestScore)]
rotated = inter.rotate(img, bestAngle, reshape = False, order = 0)
print("[INFO] angle: {:.3f}".format(bestAngle))
#cv2.imshow("Original", img)
#cv2.imshow("Rotated", rotated)
#cv2.waitKey(0)
#Return img
return rotated
Example images of the label before correction and after
Before correction ->
After correction
If anyone can help me figure this problem out, it would be of much help.
Upvotes: 13
Views: 34924
Reputation: 11
To add up to @full_pr0 answer, you can speed up calculations by 4-5 times by stacking images:
import cv2
import numpy as np
def rotate_image(image, angle):
(h, w) = image.shape[: 2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
corrected = cv2.warpAffine(image, M, (w, h), flags = cv2.INTER_CUBIC, \
borderMode = cv2.BORDER_REPLICATE)
return corrected
def determine_score(arr):
histogram = np.sum(arr, axis = 2, dtype = float)
score = np.sum((histogram[..., 1 :] - histogram[..., : -1]) ** 2, \
axis = 1, dtype = float)
return score
def correct_skew(image, delta = 0.1, limit = 5):
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + \
cv2.THRESH_OTSU)[1]
angles = np.arange(-limit, limit + delta, delta)
img_stack = np.stack([rotate_image(thresh, angle) for angle \
in angles], axis = 0)
scores = determine_score(img_stack)
best_angle = angles[np.argmax(scores)]
corrected = rotate_image(image, best_angle)
return best_angle, corrected
img_path = 'test.jpg'
img = cv2.imread(img_path, 0)
angle, corrected = correct_skew(img)
Upvotes: 1
Reputation: 46600
Here's an implementation of the Projection Profile Method algorithm for skew angle estimation. Various angle points are projected into an accumulator array where the skew angle can be defined as the angle of projection within a search interval that maximizes alignment. The idea is to rotate the image at various angles and generate a histogram of pixels for each iteration. To determine the skew angle, we compare the maximum difference between peaks and using this skew angle, rotate the image to correct the skew.
Original ->
Corrected
Skew angle: -2
import cv2
import numpy as np
from scipy.ndimage import interpolation as inter
def correct_skew(image, delta=1, limit=5):
def determine_score(arr, angle):
data = inter.rotate(arr, angle, reshape=False, order=0)
histogram = np.sum(data, axis=1, dtype=float)
score = np.sum((histogram[1:] - histogram[:-1]) ** 2, dtype=float)
return histogram, score
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
scores = []
angles = np.arange(-limit, limit + delta, delta)
for angle in angles:
histogram, score = determine_score(thresh, angle)
scores.append(score)
best_angle = angles[scores.index(max(scores))]
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
corrected = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
borderMode=cv2.BORDER_REPLICATE)
return best_angle, corrected
if __name__ == '__main__':
image = cv2.imread('1.png')
angle, corrected = correct_skew(image)
print('Skew angle:', angle)
cv2.imshow('corrected', corrected)
cv2.waitKey()
Note: You may have to adjust the delta
or limit
values depending on the image. The delta
value controls iteration step, it will iterate up until the limit
which controls the maximum angle. This method is straightforward by iteratively checking each angle + delta
and currently only works to correct skew in the range of +/- 5 degrees. If you need to correct at a larger angle, adjust the limit
value. For another approach to handle skew, take a look at this alternative method.
Upvotes: 26
Reputation: 51
To add up to @nathancy answer, for windows users, if you're getting additional skew just add dtype=float
. Whenever you create a numpy array. There's a integer overflow issue with windows as it assigns int(32) bit as data type unlike rest of the systems.
See below code; added dtype=float
in np.sum()
methods:
import cv2
import numpy as np
from scipy.ndimage import interpolation as inter
def correct_skew(image, delta=1, limit=5):
def determine_score(arr, angle):
data = inter.rotate(arr, angle, reshape=False, order=0)
histogram = np.sum(data, axis=1, dtype=float)
score = np.sum((histogram[1:] - histogram[:-1]) ** 2, dtype=float)
return histogram, score
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
scores = []
angles = np.arange(-limit, limit + delta, delta)
for angle in angles:
histogram, score = determine_score(thresh, angle)
scores.append(score)
best_angle = angles[scores.index(max(scores))]
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
borderMode=cv2.BORDER_REPLICATE)
return best_angle, rotated
if __name__ == '__main__':
image = cv2.imread('1.png')
angle, rotated = correct_skew(image)
print(angle)
cv2.imshow('rotated', rotated)
cv2.imwrite('rotated.png', rotated)
cv2.waitKey()
Upvotes: 5
Reputation: 1246
ASSUMPTIONS:
SOLUTION:
hgt_rot_angle = cv2.minAreaRect(your_CLEAN_image_pixel_coordinates_to_enclose)[-1]
com_rot_angle = hgt_rot_angle + 90 if hgt_rot_angle < -45 else hgt_rot_angle
(h, w) = my_input_image.shape[0:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, com_rot_angle, 1.0)
corrected_image = cv2.warpAffine(your_ORIGINAL_image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
ORIGINAL SOURCE:
https://www.pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/ - a GREAT tutorial to get started (kudos to Adrian Rosebrock), BUT:
cv2.minAreaRect()
is not quite clear there and the code has the same variable for detection and for correction, which is even more confusing. I used the separate variables for clarity and my explanation of the first two lines of code is below.cv2.getRotationMatrix2D()
function, based on OpenCV documentation and based on my testing. More on this below as well.SOLUTION EXPLANATION:
The cv2.minAreaRect()
function returns the rotation angle value in the [-90, 0]
range as the last element of the tuple returned, and the angle value is tied to the HEIGHT value in the same returned tuple (it's located at cv2.minAreaRect()[1][1]
, to be precise, but we're not using it here).
Unless the angle of rotation is either -90.0
or 0.0
, the decision of what dimension is chosen as the "height" is not arbitrary - it always has to go from upper left to lower right, i.e. to have a negative slope.
What this means for our use case is that, depending on the width-height proportion of the content block and on its tilt, the "height" value returned by cv2.minAreaRect()
can be either the content block's logical height OR the width.
This means 2 things for us:
So, given (1) no assumptions about the content block's aspect ratio and (2) the assumed [-45:45]
range of the tilt, we can get the common tilt of the height and the width relative to the rectangular coordinate system (in the [-45:45]
range) by simply adding 90 degrees to the rotation value of the "height" if it falls below -45.0
.
Once we get this detected and calculated "common rotation angle" value, we can use it to fix the tilt by just passing the value directly to the cv2.getRotationMatrix2D()
function.
NOTE: the calculated existing "common rotation angle" is negative for the counter-clockwise tilt and positive for the clockwise tilt, which is a very common everyday convention. However, if we think of the angle
argument of cv2.getRotationMatrix2D()
as "the correction angle to apply" (which, I think, was the intent), then the sign convenion is the OPPOSITE. So we need to pass the detected and calculated "common rotation angle" value as-is if we want to see it counter-acted in the output image, which is supported by the many tests that I have performed.
This is a direct quote on the angle
parameter from OpenCV documentation:
Rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner).
WHAT IF THE SINGLE RECTANGLE IS A POOR FIT?
The above solution works very well for densely populated full page scans, clean labels and things like that, but it does not work well at all for sparsely populated images, where the overall tightest fit is not a rectangle, i.e. when the 2nd starting assumption does not hold.
In the latter scenario the following may work IF most of the individual shapes in the input image can nicely fit into rectangles, or at least better than all of the content combined:
OTHER SOURCES:
https://www.pyimagesearch.com/2015/11/30/detecting-machine-readable-zones-in-passport-images/
https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html
Upvotes: 3