Virat Mishra
Virat Mishra

Reputation: 165

How to clean an image's text before reading with tesseract?

I am using tesseract to read text from an image. As my BinaryImage input would not be a simple text over a plain white background, I am getting only 50 % as a correct output.

is there any way to preprocess an image so that I can get proper output from tesseract? I have already tried grey scaling and binarizing the image using Otsu's method, but there was no improvement.

As I am doing all this using java, it would be helpful if someone can share details of any java lib or steps to get the better results from tesseract.

I am not getting proper ImageMagick docs to use it in my Java code as well. Any help on this is appreciated.

sample image ( any wireless bill of AT & T)

Upvotes: 1

Views: 2128

Answers (2)

Virat Mishra
Virat Mishra

Reputation: 165

I tried to optimize my output by grey scaling and binarizing the image, but it wasn't helpful. Then I tried boofcv to sharpen my image and I got 90% optimized output.

before sharpening the image, we can rescale the image if the resolutions are not big enough, using below code:

public static BufferedImage scale(BufferedImage img, int imageType, int dWidth, int dHeight, double fWidth, double fHeight) {
BufferedImage img = null;
if(img != null) {
    img = new BufferedImage(dWidth, dHeight, imageType);
    Graphics2D g = img.createGraphics();
    AffineTransform at = AffineTransform.getScaleInstance(fWidth, fHeight);
    g.drawRenderedImage(img, at);
}
return img;

}

in case, anyone gets stuck in same situation.

Upvotes: 1

fmw42
fmw42

Reputation: 53164

I think your scan of your bill may be at too small a resolution. I believe your would get better results if you had a higher resolution image (bigger dimensions). Also you could try saving your scan in a non-lossy-compressed format. You could try local area thresholding. But I do not think this will help with such small text. Nevertheless, in ImageMagick you could do that with -lat command.

convert image.jpg -negate -lat 25x25+10% -negate result.png

enter image description here

Adjust values as desired. I also have a bash unix shell script, textcleaner, that might be be better on other images. You can see examples at http://www.fmwconcepts.com/imagemagick/textcleaner/index.php

Upvotes: 1

Related Questions