Reputation: 11
I'm working on extracting text on images, that are similar to the one shown below: Warehouse boxes with all kinds of different labels. Images often have poor angles.
My code:
im = cv2.imread('1.jpg')
config = ('-l eng --oem 1 --psm 3')
text = pytesseract.image_to_string(im, config=config)
text_list = text.split('\n')
# remove blanks of varying sizes so that only words are returned
space_to_empty = [x.strip() for x in text_list]
space_clean_list = [x for x in space_to_empty if x]
print(space_clean_list)
For example, that image
returns an output of
['L2 Sy', "////’7/'7///////////////"]
on all variations of --oem
and --psm
values.
Perspective correction for the image
gives a slightly better output (though still poor) of
['R19 159 942 sEMY', 'V/ ////////////////////I////I/////////////']
again, on all variations of --oem
and --psm
values.
My questions are:
--oem
and --psm
as shown here, the output stays the same. Is this expected?Upvotes: 1
Views: 1538
Reputation: 18925
Your perspective correction is insufficient. Unfortunately, you haven't provided code on that, so I will present my full solution.
Mask the label in the image using thresholding, some morphological operations, contour finding, and extracting the central contour, assuming the label is (always) located in the center of the image.
Properly perform the perspective transform of the label to some upright rectangle.
Run pytesseract
with --psm 6
option.
That'd be the full code:
import cv2
import numpy as np
import pytesseract
# Read image
img = cv2.imread('input.jpg')
h, w = img.shape[:2]
# Mask label
mask = np.all(img > 240, axis=2).astype(np.uint8) * 255
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)))
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (21, 21)))
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = [cnt for cnt in cnts if cv2.pointPolygonTest(cnt, (w // 2, h // 2), False) > 0]
mask = cv2.drawContours(np.zeros_like(mask), cnts, -1, 255, cv2.FILLED)
# Find extreme outer points of label
# https://stackoverflow.com/a/56801276/11089932
x, y, w, h = cv2.boundingRect(mask)
l = (x, np.argmax(mask[:, x]))
r = (x+w-1, np.argmax(mask[:, x+w-1]))
t = (np.argmax(mask[y, :]), y)
b = (np.argmax(mask[y+h-1, :]), y+h-1)
# Perspective transform of label
# https://stackoverflow.com/a/65990763/11089932
bw, bh = [400, 200]
pts1 = np.float32([t, l, b, r])
pts2 = np.float32([[0, 0], [0, bh-1], [bw-1, bh-1], [bw-1, 0]])
M = cv2.getPerspectiveTransform(pts1, pts2)
warped = cv2.warpPerspective(img, M, (bw, bh))
# Raw OCR on transformed label
text = pytesseract.image_to_string(warped, config='--psm 6')
print(text.replace('\f', ''))
# POS | Registered
# R RR19 159 942 5MY
# WU UAV UMBRUE OE RT
As you can see, already the raw OCR is quite good. You're free to further pre-process the warped
image to cut out the header, the barcode, and so on.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.19041-SP0
Python: 3.9.1
PyCharm: 2021.1.2
NumPy: 1.20.3
OpenCV: 4.5.2
pytesseract: 5.0.0-alpha.20201127
----------------------------------------
Upvotes: 4