Reputation: 39
I am implementing an OCR system. Tesseract API is used for extraction. Images are preprocessed using OpenCV before extracting text. Under preprocessing grayscaling, sharpening and adaptive thresholding is carried out. After extracting a text in the image the following output is gained.
Expected Output
Let's talk ;-)
Gained output
" yr _ W??? V. ? _
W fag '7? |g§3:? V
é claw?!
Does anybody know the reason for this? I edited the question as took a different path to implement my project.I input an image and used opnCV to sharpen the image. This is the input image.input image
Then I got the following output.Sharpened output image. When I use the sharpened image for Tesseract Api it provides mixture of characters. But if I provide the input image to Tesseract API, it correctly extract the words. How can I remove those shaded areas in the sharpened image?
This is the code I used to sharp the input image
try {
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
Mat source = Imgcodecs.imread("input.jpg",
Imgcodecs.CV_LOAD_IMAGE_GRAYSCALE);
Mat destination = new Mat(source.rows(), source.cols(), source.type());
Imgproc.equalizeHist(source, destination);
Imgcodecs.imwrite("sharpen.jpg", destination);
} catch (Exception e) {
System.out.println("error: " + e.getMessage());
}
Upvotes: 0
Views: 883
Reputation: 1734
Well, you should provide us with input image at least, so we could better see, what is the problem. But as seen from the expected and actual output, that your input image is very bad for scanning, so there could be a few possible (most common) issues with it:
You should provide us with your input image and how you process it, it would be much easier to find your problem, so if it is possible, please, share with us.
Upvotes: 1