ductran
ductran

Reputation: 10203

Text cleaner in OpenCV like ImageMagicK script

I try to make text in image cleaner and clearer before run OCR with tesseract. In this link, they provided a good scripts to do it by ImageMagicK. I wonder is possible to convert this script and function into OpenCV code? For example, the script with arguments like this:

-g -e none -f 15 -o 20

From the explanation:

-g ...................... convert document to grayscale before enhancing
-e .... enhance ......... enhance image brightness before cleaning;
                       choices are: none, stretch or normalize; 
                       default=none
-f .... filtersize ...... size of filter used to clean background;
                       integer>0; default=15
-o .... offset .......... offset of filter in percent used to reduce noise;
                      integer>=0; default=5

How can I do the same in OpenCV code? As I am a newbie in OpenCV, I just only know how to convert to grayscale. Any help would be appreciated.

Upvotes: 3

Views: 2152

Answers (1)

remi
remi

Reputation: 3988

You have to check ImageMagick documentation to find the exact algorithms used but here is a rough guess:

-g ...................... convert document to grayscale before enhancing

That would be either cv::cvtColor with BGR2GRAY conversion or even better, load directly your image in grayscale with cv::imread(filename,CV_LOAD_IMAGE_GRAYSCALE)

-e .... enhance ......... enhance image brightness before cleaning;
                       choices are: none, stretch or normalize; 
                       default=none

Since you choosed "none", that would be nothing. Otherwise, use cv::equalizeHist (tutorial).

-f .... filtersize ...... size of filter used to clean background;
                       integer>0; default=15
-o .... offset .......... offset of filter in percent used to reduce noise;
                      integer>=0; default=5

My guess for the two latter parameters is cv::adaptiveThreshold with -f corresponding the the blockSize param in OpenCV and -o to the constant C. The actual adaptive thresholding methode (gaussian or mean) is what you need to check in ImageMagick documentation

Upvotes: 1

Related Questions