batuman
batuman

Reputation: 7304

Reading error using Tessearact OCR

I use Tesseract OCR for my text reading. My binary image is clear, but when the image is read by OCR, there is error in reading. Actual numbers are 05820, but it is read as 05320. Very clear and sharp image has error, what could be wrong in implementation? I attached the image and the Tessearact code I used.

     ![enter image description here][1]int OCR::textRecognition(void){
        tesseract::TessBaseAPI tess;
        tess.Init(NULL, "eng", tesseract::OEM_DEFAULT);
        tess.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);

        tess.SetImage((uchar*)extText.data, extText.cols, extText.rows, 1, extText.cols);
        // Get the text
        char* out = tess.GetUTF8Text();
        std::cout << out << std::endl;
        return SUCCESS;
    }

enter image description here

Upvotes: 0

Views: 279

Answers (1)

Andrey  Smorodov
Andrey Smorodov

Reputation: 10850

Try train tesseract using the font you plan to work with. It should drammatically improve precision. You can use SerakTesseractTrainer to do this. Here is youtube tutorial: http://www.youtube.com/watch?v=47rgBL9NZkM

Upvotes: 2

Related Questions