Reputation: 148
I'm working on performing OCR of energy meter displays: example 1 example 2 example 3
I tried to use tesseract-ocr with the letsgodigital trained data. But the performance is very poor.
I'm fairly new to the topic and this is what I've done:
import numpy as np
import cv2
import imutils
from skimage import exposure
from pytesseract import image_to_string
import PIL
def process_image(orig_image_arr):
gry_disp_arr = cv2.cvtColor(orig_image_arr, cv2.COLOR_BGR2GRAY)
gry_disp_arr = exposure.rescale_intensity(gry_disp_arr, out_range= (0,255))
#thresholding
ret, thresh = cv2.threshold(gry_disp_arr,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
return thresh
def ocr_image(orig_image_arr):
otsu_thresh_image = process_image(orig_image_arr)
cv2_imshow(otsu_thresh_image)
return image_to_string(otsu_thresh_image, lang="letsgodigital", config="--psm 8 -c tessedit_char_whitelist=.0123456789")
img1 = cv2.imread('test2.jpg')
cnv = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
text = ocr_image(cnv)
This gives very poor results with the example images. I have a couple of questions:
How can I identify the four corners of the display? (Edge detection doesn’t seem to work very well)
Is there any futher preprocessing that I can do to improve the performance?
Thanks for any help.
Upvotes: 2
Views: 3275
Reputation: 1121
Notice how your power meters either use blue or green LEDs to light up the display; I suggest you use this color display to your advantage. What I'd do is select only one RGB channel based on the LED color. Then I can threshold it based on some algorithm or assumption. After that, you can do the downstream steps of cropping / resizing / transformation / OCR etc.
For example, using your example image 1, look at its histogram here. Notice how there is a small peak of green to the right of the 150 mark.
I take advantage of this, and set anything below 150 to zero. My assumption being that the green peak is the bright green LED in the image.
img = cv2.imread('example_1.jpg', 1)
# Get only green channel
img_g = img[:,:,1]
# Set threshold for green value, anything less than 150 becomes zero
img_g[img_g < 150] = 0
This is what I get. This should be much easier for downstream OCR now.
# You should also set anything >= 150 to max value as well, but I didn't in this example
img_g[img_g >= 150] = 255
The above steps should replace this step
_ret, thresh = cv2.threshold(img_g, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
Here's the output of this step.
Upvotes: 1