Paul Szabo
Paul Szabo

Reputation: 73

How to extract dotted text from image?

I'm working on my bachelor's degree final project and I want to create an OCR for bottle inspection with python. I need some help with text recognition from the image. Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?

I tried image processing operations on the image and I used pytesseract to recognize the characters.

Using the code bellow I got from this photo:

enter image description here

to this one:

enter image description here

and then to this one:

enter image description here

Sharpen function:

def sharpen(img):
  sharpen = iaa.Sharpen(alpha=1.0, lightness = 1.0)
  sharpen_img = sharpen.augment_image(img)
  return sharpen_img

Image processing code:

textZone = cv2.pyrUp(sharpen(originalImage[y:y + h - 1, x:x + w - 1])) #text zone cropped from the original image

sharp = cv2.cvtColor(textZone, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(sharp, 127, 255, cv2.THRESH_BINARY)

#the functions such as opening are inverted (I don't know why) that's why I did opening with MORPH_CLOSE parameter, dilatation with erode and so on

kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
open = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_open)

kernel_dilate = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,7))
dilate = cv2.erode(open,kernel_dilate)

kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
close = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, kernel_close)

print(pytesseract.image_to_string(close))

This is the result of pytesseract.image_to_string:

22203;?!)

92:53 a

The expected result is :

22/03/20

02:53 A

Upvotes: 6

Views: 2197

Answers (3)

nathancy
nathancy

Reputation: 46670

One potential thing you could do to improve recognition on the characters is to dilate the characters so pytesseract gives a better result. Dilating the characters will connect the individual blobs together and can fix the / or the A characters. So starting with your latest binary image:

Original

Dilate with a 3x3 kernel with iterations=1 (left) or iterations=2 (right). You can experiment with other values but don't do it too much or the characters will all connect. Maybe this will provide a better result with you OCR.

import cv2

image = cv2.imread("1.PNG")
thresh = cv2.threshold(image, 115, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilate = cv2.dilate(thresh, kernel, iterations=1)
final = cv2.threshold(dilate, 115, 255, cv2.THRESH_BINARY_INV)[1]

cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('final', final)
cv2.waitKey(0)

Upvotes: 1

Dave W. Smith
Dave W. Smith

Reputation: 24966

"Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?"

First, kudos for taking this project on and getting this far with it. What you have from the OpenCV/cv2 standpoint looks pretty good.

Now, if you're thinking of Tesseract to carry you the rest of the way, at the very least you'll have to train it. Here you have a tough choice: Invest in training Tesseract, or work up a CNN to recognize a limited alphabet. If you have a way to segment the image, I'd be tempted to go with the latter.

Upvotes: 2

binaryDragon
binaryDragon

Reputation: 43

From the result you got and the expected result, you can see that some of the characters are recognized correctly. Assuming you are using a different image from that shown in the tutorial, I recommend you to change the values of threshold and getStructuringElement.

These values work better depending on the image color. The tutorial author must have optimized it for his/her use (by trial and error or some other way).

Here is a video if you want to play around with those value using sliders in opencv. You can also print your result in the same loop to see if you are getting the desired result.

Upvotes: 2

Related Questions