Improving Tesseract OCR Quality Fails

Question

I am currently using tesseract to scan receipts. The quality wasn't good so I read this article on how to improve it: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#noise-removal. I implemented resizing, deskewing(aligning), and gaussian blur. But none of them seem to have a positive effect on the accuracy of the OCR except the deskewing. Here is my code for resizing and gaussian blur. Am I doing anything wrong? If not, what else can I do to help?

Code:

+(UIImage *) prepareImage: (UIImage *)image{

    //converts UIImage to Mat format
    Mat im = cvMatWithImage(image);

    //grayscale image
    Mat gray;
    cvtColor(im, gray, CV_BGR2GRAY);

    //deskews text
    //did not provide code because I know it works
    Mat preprocessed = preprocess2(gray);
    double skew = hough_transform(preprocessed, im);
    Mat rotated = rot(im,skew* CV_PI/180);

    //resize image
    Mat scaledImage = scaleImage(rotated, 2);

    //Guassian Blur
    GaussianBlur(scaledImage, scaledImage, cv::Size(1, 1), 0, 0);

    return UIImageFromCVMat(scaledImage);
}

// Organization -> Resizing

Mat scaleImage(Mat mat, double factor){
    Mat resizedMat;
    double width = mat.cols;
    double height = mat.rows;
    double aspectRatio = width/height;
    resize(mat, resizedMat, cv::Size(width*factor*aspectRatio, height*factor*aspectRatio));
    return resizedMat;
}

Receipt:

Improving Tesseract OCR Quality Fails

Answers (1)

Related Questions