Reputation: 41
Given an image (i.e. newspaper, scanned newspaper, magazine etc), how do I detect the region containing text? I only need to know the region and remove it, don't need to do text recognition.
The purpose is I want to remove these text areas so that it will speed up my feature extraction procedure as these text areas are meaningless for my application. Anyone know how to do this?
BTW, it will be good if this can be done in Matlab!
Best!
Upvotes: 4
Views: 2962
Reputation: 114786
You can use Stroke Width Transform (SWT) to highlight text regions. Using my mex implementation posted here, you can
img = imread('https://i.sstatic.net/Eyepc.jpg');
[swt swtcc] = SWT( img, 0, 10 );
Playing with internal parameters of the edge-map extraction and image filtering in SWT.m
can help you tweak the resulting mask to your needs.
To get this result:
I used these parameters for the edge map computation in SWT.m
:
edgeMap = single( edge( img, 'canny', [0.05 0.25] ) );
Upvotes: 2
Reputation: 1327
If your image is well binarized and you know the usual size of the text you could use the HorizontalRunLengthSmoothing and VerticalRunLengthSmoothing algorithms. They are implemented in the open source library Aforge.Net but it should be easy to reimplement them in Matlab. The intersection of the result image from these algorithm will give you a good indication that the region contains text, it is not perfect but it is fast.
Upvotes: 1
Reputation: 39389
This example in the Computer Vision System Toolbox in Matlab shows how to detect text using MSER regions.
Upvotes: 1
Reputation: 112
Text detection in natural images is an active area of research in computer vision community. U can refer to ICDAR papers. But in your case I think it should be simple enough. As you have text from newspaper or magazines, it should be of fixed size and horizontally oriented.
So, you can apply scanning window of a fixed size, say 32x32. Train it on ICDAR 2003 training dataset for positive windows having text in it. U can use a small feature set of color and gradients and train an SVM which would give a positive or negative result for a window having text or not.
For reference go to http://crypto.stanford.edu/~dwu4/ICDAR2011.pdf . For code, you can try their homepages
Upvotes: 1