Reputation: 330
i have an image with multiple red rectangle image extraction and output is good.
i'm using https://github.com/autonise/CRAFT-Remade for text-recognition
original:
my image:
i try to extract text only in all rectangle with pytesserac but without success. output result :
r
2
aseeaaei
ae
How we can extract text from this image correctly with accuracy?
part of code:
def saveResult(img_file, img, boxes, dirname='./result/', verticals=None, texts=None):
""" save text detection result one by one
Args:
img_file (str): image file name
img (array): raw image context
boxes (array): array of result file
Shape: [num_detections, 4] for BB output / [num_detections, 4] for QUAD output
Return:
None
"""
img = np.array(img)
# make result file list
filename, file_ext = os.path.splitext(os.path.basename(img_file))
# result directory
res_file = dirname + "res_" + filename + '.txt'
res_img_file = dirname + "res_" + filename + '.jpg'
if not os.path.isdir(dirname):
os.mkdir(dirname)
with open(res_file, 'w') as f:
for i, box in enumerate(boxes):
poly = np.array(box).astype(np.int32).reshape((-1))
strResult = ','.join([str(p) for p in poly]) + '\r\n'
f.write(strResult)
poly = poly.reshape(-1, 2)
cv2.polylines(img, [poly.reshape((-1, 1, 2))], True, color=(0, 0, 255), thickness=2) # HERE
ptColor = (0, 255, 255)
if verticals is not None:
if verticals[i]:
ptColor = (255, 0, 0)
if texts is not None:
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 0.5
cv2.putText(img, "{}".format(texts[i]), (poly[0][0]+1, poly[0][1]+1), font, font_scale, (0, 0, 0), thickness=1)
cv2.putText(img, "{}".format(texts[i]), tuple(poly[0]), font, font_scale, (0, 255, 255), thickness=1)
# Save result image
cv2.imwrite(res_img_file, img)
after your comment, here's result:
and tesseract result good for first test but not accuracy :
400
300
200
“2615
1950
24
16
Upvotes: 3
Views: 3523
Reputation: 46670
When using Pytesseract to extract text, preprocessing the image is extremely important. In general, we want to preprocess the text such that the desired text to extract is black with the background in white. To do this, we can use Otsu's threshold to obtain a binary image then perform morphological operations to filter and remove noise. Here's a pipeline:
After converting to grayscale, we resize the image using imutils.resize()
then Otsu's threshold for a binary image. The image is now in only black or white but there is still unwanted noise
From here we invert the image and perform morphological operations with a horizontal kernel. This step merges the text into a single contour where we can filter and remove the unwanted lines and small blobs
Now we find contours and filter using a combination of contour approximation, aspect ratio, and contour area to isolate the unwanted sections. The removed noise is highlighted in green
Now that the noise is removed, we invert the image again to have the desired text in black then perform text extraction. I've also noticed that adding in a slight blur enhances recognition. Here's the cleaned image we perform text extraction on
We give Pytesseract the --psm 6
configuration since we want to treat the image as a uniform block of text. Here's the result from Pytesseract
6745 63 6 10.50
2245 21 18 17
525 4 22 0.18
400 4 a 0.50
300 3 4 0.75
200 2 3 0.22
2575 24 3 0.77
1950 ii 12 133
The output isn't perfect but its close. You can experiment with additional configuration settings here
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Invert image and perform morphological operations
inverted = 255 - thresh
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,3))
close = cv2.morphologyEx(inverted, cv2.MORPH_CLOSE, kernel, iterations=1)
# Find contours and filter using aspect ratio and area
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.01 * peri, True)
x,y,w,h = cv2.boundingRect(approx)
aspect_ratio = w / float(h)
if (aspect_ratio >= 2.5 or area < 75):
cv2.drawContours(thresh, [c], -1, (255,255,255), -1)
# Blur and perform text extraction
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('close', close)
cv2.imshow('thresh', thresh)
cv2.waitKey()
Upvotes: 2