Ufuk Can Bicici
Ufuk Can Bicici

Reputation: 3649

OCR: How to localize characters in a serial number image?

I have the following problem: I have some serial numbers which always consist of 2 lines of 7 characters, (0-9 and A-Z), with a total of 14 characters. These serial numbers are located on the images of various products; I am able to localize these by using a lot of image processing and geometry transformation algorithms into the following form:

enter image description here

Now my aim is to read these serial numbers. I have first tried the Tesseract API after localizing the numbers into such tight images. Unfortunately, either I failed to adjust the API properly or this particular font is not in Tesseract's training set, since Tesseract is not able to properly parse the serial number. Then I quickly turned to custom solutions.

The basic thing to do is, since I know the aspect ratio and relative sizes of the characters, training a simple classifier (HOG + Linear SVM) on labeled character and background images (I have to do this in anyway) and then run it via a classical sliding window fashion, then apply non-maximum suppresion to remove false positive detections. This brute force approach does not seem to be very efficient for me, since 1) a lot feature extraction + classification operations have to run for each window 2) I have to manually label a lot of background (negative) samples, which include transition areas between two characters, the vertical space between two lines, pure background etc. Since I am able to localize the serial numbers into a rectangle which only includes a solid background except the characters, I thought of a simple foreground/background segmentation scheme. The first thing I tried is to convert the image into grayscale, downscale it and run a low pass filter to remove the high frequency noise and apply Otsu Thresholding. If I would be able localize each character almost perfectly, I could run a classifier just containing its bounding box and I won't need a lot of negative transition/background etc. labeled samples. From the above operation, I have the following result, with the optimal blur kernel size:

enter image description here

Now I am almost able to localize each character, but as you can see in the second image, due to bad lighting conditions, some noisy clutter is passed as foreground (especially around 0 and F, on the left side). Maybe some additional dilation/erosion operations on the binary image would help to reduce non-character clutter, but certainly I would not be able to completely eradicate them. My question is about any help and ideas about to how to localize the characters at that stage, after Otsu thresholding? I do know the width and height of each character (up to a small uncertainty caused by hand crafted measurements) and I also know that they always constitute two lines with 7 elements in each. I think about a connected component algorithm, which groups foreground pixels into blobs and then filter out blobs which do have bounding boxes with inconsistent widths and heights, but it is far from coding stage. I am open to any similar ideas or examples. (If it would be any help, I use OpenCV with Java).

Upvotes: 2

Views: 924

Answers (1)

user1196549
user1196549

Reputation:

When the characters are isolated and in a single piece, connected components is the way to go. Just ignore the tiny blobs and use the bounding boxes.

Sometimes characters will have small protrusions (like the F), which cause the characters to appear larger than they are. For fixed width fonts, you can adjust the box to that size.

Sometimes characters will be split in two or three pieces. You can regroup the pieces by gometric considerations and a priori knowledge on the text structure.

On such cases, achieving 100% reliability is a real challenge.

Upvotes: 2

Related Questions