Reputation: 165
I am using tesseract to read text from an image. As my BinaryImage input would not be a simple text over a plain white background, I am getting only 50 % as a correct output.
is there any way to preprocess an image so that I can get proper output from tesseract? I have already tried grey scaling and binarizing the image using Otsu's method, but there was no improvement.
As I am doing all this using java, it would be helpful if someone can share details of any java lib or steps to get the better results from tesseract.
I am not getting proper ImageMagick docs to use it in my Java code as well. Any help on this is appreciated.
sample image ( any wireless bill of AT & T)
Upvotes: 1
Views: 2128
Reputation: 165
I tried to optimize my output by grey scaling and binarizing the image, but it wasn't helpful. Then I tried boofcv to sharpen my image and I got 90% optimized output.
before sharpening the image, we can rescale the image if the resolutions are not big enough, using below code:
public static BufferedImage scale(BufferedImage img, int imageType, int dWidth, int dHeight, double fWidth, double fHeight) {
BufferedImage img = null;
if(img != null) {
img = new BufferedImage(dWidth, dHeight, imageType);
Graphics2D g = img.createGraphics();
AffineTransform at = AffineTransform.getScaleInstance(fWidth, fHeight);
g.drawRenderedImage(img, at);
}
return img;
}
in case, anyone gets stuck in same situation.
Upvotes: 1
Reputation: 53164
I think your scan of your bill may be at too small a resolution. I believe your would get better results if you had a higher resolution image (bigger dimensions). Also you could try saving your scan in a non-lossy-compressed format. You could try local area thresholding. But I do not think this will help with such small text. Nevertheless, in ImageMagick you could do that with -lat command.
convert image.jpg -negate -lat 25x25+10% -negate result.png
Adjust values as desired. I also have a bash unix shell script, textcleaner, that might be be better on other images. You can see examples at http://www.fmwconcepts.com/imagemagick/textcleaner/index.php
Upvotes: 1