Reputation: 295
I want to find the text lines in a page of text (like from a book).
Sample image:
One of the problems is that I want to implement this in Javascript and this is the best computer vision library that I found: http://inspirit.github.io/jsfeat/#imgproc
Therefore I am limited to the algorithms implemented in JSFeat (or another JS library).
I thought of doing feature detection on the page and then doing statistics on the plotted points to find the lines. I'm not sure that's a good idea or how this can be done.
For example this is the output of FAST when applied on that image.
It should work regardless of the font used. Also slight rotation tolerance would be even better.
Help much appreciated!
Upvotes: 1
Views: 677
Reputation: 5467
My approach would be to count the number of vertical edges on each horizontal scanline. Each letter will produce two or more edges.
First, use the sobel operator to calculate x derivative:
Now we have positive and negative edges, but we want to count them both as positive. So take the absolut value:
Now count the edges on each line. This can be done by summing the pixels up, or simply by scaling the image to a width of 1px, leaving the height unchanged. For easy viewing I've plotted the result:
Now you'll need to threshold this result somehow, or maybe find the maxima after running a blur on the 1px-width image. If the font size and the letters per line stay roughly the same, this is easy.
You may want to re-run on different rotations of the original image and then use the result with the highest contrast.
Upvotes: 1