moeseth
moeseth

Reputation: 1945

How to find number from image in OCR?

I'm trying to get the number contours from an image. Original image is in number_img:

enter image description here

After I've used the following code:

gray = cv2.cvtColor(number_img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (1, 1), 0)
ret, thresh = cv2.threshold(blur, 70, 255, cv2.THRESH_BINARY_INV)

img2, contours, _ = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

for c in contours:
    area = cv2.contourArea(c)
    [x, y, w, h] = cv2.boundingRect(c)

    if (area > 50 and area < 1000):
        [x, y, w, h] = cv2.boundingRect(c)
        cv2.rectangle(number_img, (x, y), (x + w, y + h), (0, 0, 255), 2)

enter image description here

Since there are small boxes in between, I tried to limit with height:

if (area > 50 and area < 1000) and h > 50:
    [x, y, w, h] = cv2.boundingRect(c)
    cv2.rectangle(number_img, (x, y), (x + w, y + h), (0, 0, 255), 2)

enter image description here

What other ways should I do to get the best contours of number to do OCR?

Thanks.

Upvotes: 2

Views: 1944

Answers (1)

Just tried in Matlab. Hopefully you can adapt the code to OpenCV and tweak some parameters. It is not clear the right most blob is a number or not.

img1 = imread('DSYEW.png');
% first we can convert the grayscale image you provided to a binary 
% (logical) image. It is always the best option in image preprocessing.
% Here I used the threshold .28 based on your image. But you may change it
% for a general solution.
img = im2bw(img1,.28);

% Then we can use the Matlab 'regionprops' command to identify the
% individual blobs in binary image. 'regionprops' gives us an output, the
% Area of the each blob.
s = regionprops(imcomplement(img));

% Now as you did, we can filter out the bounding boxes with an area
% threshold. I used 350 originally. But it can be changed for a better 
% output.
s([s.Area] < 350) = [];

% Now we draw each bounding box on the image.
figure; imshow(img);
for k = 1 : length(s)
  bb = s(k).BoundingBox;
  rectangle('Position', [bb(1),bb(2),bb(3),bb(4)],...
  'EdgeColor','r','LineWidth',2 )
end

Output image:

enter image description here

Update 1:

Just changed the area parameter in the above code as follows. Unfortunately I don't have Python OpenCV in my Mac. But, it is all about tweaking the parameters in you code.

s([s.Area] < 373) = [];

Output image:

enter image description here

Update 2:

Numbers 3 and 4 in the above figure were detected as one digit. If you look at carefully you are see that 3 and 4 are connected with each other, and that is why above code detected it as a single digit. So I used the imdilate function to get rid of that. Next, in your code, even the white holes inside some digits were detected as digits. To eliminate that we can fill the holes using imfill in Matlab.

Updated code:

img1 = imread('TCXeuO9.png');
img = im2bw(img1,.28);

img = imcomplement(img);
img = imfill(img,'holes');
img = imcomplement(img);

se = strel('line',2,90);
img = imdilate(img, se);

s = regionprops(imcomplement(img));
s([s.Area] < 330) = [];

figure; imshow(img);
for k = 1 : length(s)
  bb = s(k).BoundingBox;
  rectangle('Position', [bb(1),bb(2),bb(3),bb(4)],...
  'EdgeColor','r','LineWidth',2 )
end

Output image:

enter image description here

Upvotes: 3

Related Questions