Min Joon Seo
Min Joon Seo

Reputation: 51

Tesseract OCR: Parameter for Font Size (Single Character)

I want to use Tesseract to recognize a single noiseless character with a typical font (ex. Times New Roman, Arial, etc. No weird font). The input image just contains the character, so the input image size is equivalent to the font size.

I already set the page segmentation mode to single character, but the result is still not satisfactory, with error rate of ~50%.

I thought that I can improve my result if I tell Tesseract what my font size will be. Is there such parameter? Also, if it exists, does python-tesseract (Python wrapper) allow to tweak this parameter?

Upvotes: 5

Views: 6354

Answers (1)

Sathyaraj Palanisamy
Sathyaraj Palanisamy

Reputation: 629

If your font size is too small then increase the image height and width ,so that tesseract will provide more accurate output.

        var srcImage1 = System.Drawing.Image.FromFile(@"D:\Image\font_english.jpg");
        var newWidth1 = (int)(srcImage1.Width * 2);
        var newHeight1 = (int)(srcImage1.Height * 2);

        var image  = new Bitmap(srcImage1, new Size(newWidth1, newHeight1));
        var ocr = new Tesseract();

          ocr.Init(@"D:\OCRTEST\tessdata\", "eng", false);
          ocr.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-0123456789'?.;=,()");
          var result = ocr.DoOCR(image, Rectangle.Empty);
          foreach (Word word in result)
          {
              Response.Write(word.Text+" ");

          }

Upvotes: 3

Related Questions