Unclear text extraction by Tesseract

Question

I am implementing an OCR system. Tesseract API is used for extraction. Images are preprocessed using OpenCV before extracting text. Under preprocessing grayscaling, sharpening and adaptive thresholding is carried out. After extracting a text in the image the following output is gained.

Expected Output

Let's talk ;-)

Gained output

" yr _ W??? V. ? _
W fag '7? |g§3:? V
é claw?!

Does anybody know the reason for this? I edited the question as took a different path to implement my project.I input an image and used opnCV to sharpen the image. This is the input image.input image

Then I got the following output.Sharpened output image. When I use the sharpened image for Tesseract Api it provides mixture of characters. But if I provide the input image to Tesseract API, it correctly extract the words. How can I remove those shaded areas in the sharpened image?

This is the code I used to sharp the input image

try {
            System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
            Mat source = Imgcodecs.imread("input.jpg",
                    Imgcodecs.CV_LOAD_IMAGE_GRAYSCALE);
            Mat destination = new Mat(source.rows(), source.cols(), source.type());

            Imgproc.equalizeHist(source, destination);
            Imgcodecs.imwrite("sharpen.jpg", destination);

        } catch (Exception e) {
            System.out.println("error: " + e.getMessage());
        }

Unclear text extraction by Tesseract

Answers (1)

Related Questions