Reputation: 842
I've got a little side project that I'd like to use for playing with computer vision. I have a scan of a document that has some words circled, or more specifically surrounded by 2 parallel horizontal lines joined by curves at each end. Similar to the word search worksheets that elementary school children work on, but with cleaner lines and only horizontal.
Goal is to extract out the circled bits and then pass only those portions to an OCR process to get the circled text.
I've used opencv a bit before for facial recognition using some of the packaged haar cascades. Would a similar approach work for simple shapes or are there lower level approaches within OpenCV that would work well?
Upvotes: 0
Views: 1392
Reputation: 3522
If lines are always in the same (or just similar) color you can use inRange function to get only lines. Then use findContours function to find contours of all circled areas, than fill them with white pixels, and then use bitwise and operation on this image and on original image. As a result you will get only the circled areas (with lines included - if you want to avoid this, try to user erode and dilate functions).
I've used opencv a bit before for facial recognition using some of the packaged haar cascades. Would a similar approach work for simple shapes or are there lower level approaches within OpenCV that would work well?
I think that it's possible to create Haar cascade which will find circled areas, but:
Upvotes: 2
Reputation: 20018
A nice, simple method for detecting lines in an image is the Hough Transform. It basically acts as an accumulation buffer of line parameters. This should be able to detect your long parallel lines fairly readily, and distinguish between them and the letters by thresholding. Then you can iterate through the lines and extract a region from parallel pairs to segment the letters.
Upvotes: 1