Reputation: 51
I want to use Tesseract to recognize a single noiseless character with a typical font (ex. Times New Roman, Arial, etc. No weird font). The input image just contains the character, so the input image size is equivalent to the font size.
I already set the page segmentation mode to single character, but the result is still not satisfactory, with error rate of ~50%.
I thought that I can improve my result if I tell Tesseract what my font size will be. Is there such parameter? Also, if it exists, does python-tesseract (Python wrapper) allow to tweak this parameter?
Upvotes: 5
Views: 6354
Reputation: 629
If your font size is too small then increase the image height and width ,so that tesseract will provide more accurate output.
var srcImage1 = System.Drawing.Image.FromFile(@"D:\Image\font_english.jpg");
var newWidth1 = (int)(srcImage1.Width * 2);
var newHeight1 = (int)(srcImage1.Height * 2);
var image = new Bitmap(srcImage1, new Size(newWidth1, newHeight1));
var ocr = new Tesseract();
ocr.Init(@"D:\OCRTEST\tessdata\", "eng", false);
ocr.SetVariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-0123456789'?.;=,()");
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (Word word in result)
{
Response.Write(word.Text+" ");
}
Upvotes: 3