Reputation: 713
I have a binary image (only 0 and 255 pixels) like the one below.
I want to extract bounding boxes around the letters such as A,B,C and D. The image is large (around 4000x4000) and the letters can be quite small (like B and D above). Moreover, the characters are broken. That is, there are gaps of black pixels within the outline of a character (such as A below).
The image has white noise, which are like streaks of white lines, scattered around the image.
What I have tried -
Extracting contours - The issue is that, for broken characters (like "A"), multiple disconnected contours are obtained for a character. I am not able to obtain a contour for the entire character.
Dilation to join edges - This solves the disconnected contours (for large characters) to a certain extent. However, with dilation, I lose a lot of information about smaller characters which now appear like blocks of white pixels.
I thought of clustering similar pixels but am not able to come up with a well defined solution.
I kindly request for some ideas! Thanks.
Upvotes: 0
Views: 182
Reputation: 220
How about this procedure?
The detail is up to you for each process.
+) Search an example of MNIST recognition. The MNIST dataset is a handwritten digit dataset. There are lots of examples about it. (Even for noisy MNIST)
Upvotes: 1