Ali
Ali

Reputation: 267267

Algorithm for parsing characters from an image for OCR

I'm working on OCR, and right now I'm working on parsing each individual character away from the others. E.g if I have an image that says the following:

12345678.90

I want to detect the x,y coordinates of where each number starts and where it ends in the image, so that I can determine how many numbers there are to process, and to then parse out each individual number / character, and process it.

I have devised a simple algorithm for doing it, and I want some opinions / reviews on how it could be improved.

(In this application, I have to only process numbers, but if this algorithm could also parse out letters, that'd be even better).

There is a pixel or so of gap in the background color. And that it may not be visible to us, but it is there and will be found by the program as goes pixel by pixel horizontally, reading the colors. That would tell it where the character ends horizontally. So e.g, it might detect the background color pixel at 15, 30.

Could this algorithm be improved, and/or am I correct in my assumption on step 6?

Upvotes: 3

Views: 5311

Answers (3)

TripeHound
TripeHound

Reputation: 2970

I've not tried to write OCR software, but we do use it, and it is (or can) get very complicated.

It's not totally clear where your image is coming from; if it's a scanned image, then there are several complications. Not least in regard to your plan is that even if there is a gap between digits it may not be vertical (it's very unlikely that the page scanned will be totally straight). Other factors include "speckle" -- random dots caused by dirt etc. on the image or the scanner. If you're processing this kind of image, you almost certainly need to look towards Image Processing techniques that apply many different mathematical operations to the whole array of pixels to do things like deskew (straighten the image), despeckle (get rid of random dots); edge-enhancement (strengthen changes from light to dark to enhance lines).

From your use of "background" and "foreground" colours, it may be that you're trying to "OCR" an image from the screen? If so (some kind of "screen-scraping" process), and you know (or can be trained with) the specific character-shapes being interpreted, then a variant of the sliding window may help: you slide the known image of a '5' around the image at different offsets: if all the pixels of the '5' match "foreground" pixels in the image, then you know you've found a '5'. Repeat for other digits. As above, this is a "virtual" window we're talking about.

Upvotes: 1

kudkudak
kudkudak

Reputation: 496

A common approach which I know for segmentation of digits is the sliding window. The basic idea is that you slide a window of some size over the image of digits.

Each movement of the sliding window produces an image (you look only at pixels covered by the window). The sliding window will be narrow. Now classifier can be trained, that will map sliding window to 1 or 0, where 1 indicates that sliding window is centered on a split of 2 digits, and 0 indicates the opposite.

You would need some training data to train the classifier. Or you can try to use unsupervised learning.

EDIT : This video can be useful : https://www.youtube.com/watch?v=y6ga5DeVgSY

Upvotes: 2

Martijn Courteaux
Martijn Courteaux

Reputation: 68907

DISCLAIMER: I never wrote any OCR-like software before.

To me, your algorithm seems a bit off, because of the following reasons:

  • 1 starts not where you find the first pixel at the bottom, because you still have the little stroke that points to the left, on top of the 1.
  • 2 would be only a few pixels high, since you are going straight up until you find a background pixel.
  • 3 would result in being only 1 pixel by 1 pixel, due to the same arguments as above.
  • etc...

I would try to use a recursive algorithm that follows the foreground color pixels as far as it can without going into the background pixels. When using big images with big characters, this might cause a stack overflow, so it would be nice to do the trick in a couple of for loops instead of using a recursive function.

If you are doing it this pixel by pixel discovery of one character, you can use that process to create vector information on what your character looks like. I think that would be a cool starting point to recognize the characters.

Upvotes: 1

Related Questions