A191919
A191919

Reputation: 3442

OCR TesseractEngine

I am using OCR to recognize digits on picture

enter image description here

var engine = new TesseractEngine(@"C:\Projects\tessdata", "eng", EngineMode.Default,);
var currentImage = TakeScreen();
var page = engine.Process(ScaleByPercent(currentImage, 500));
var text = page.GetText().Replace("\n", "");

Scale:

public Bitmap ScaleByPercent(Bitmap imgPhoto, int Percent)
    {
        float nPercent = ((float)Percent / 100);

        int sourceWidth = imgPhoto.Width;
        int sourceHeight = imgPhoto.Height;
        var destWidth = (int)(sourceWidth * nPercent);
        var destHeight = (int)(sourceHeight * nPercent);

        var bmPhoto = new Bitmap(destWidth, destHeight,
                                 PixelFormat.Format24bppRgb);
        bmPhoto.SetResolution(imgPhoto.HorizontalResolution,
                              imgPhoto.VerticalResolution);

        Graphics grPhoto = Graphics.FromImage(bmPhoto);
        grPhoto.InterpolationMode = InterpolationMode.HighQualityBicubic;

        grPhoto.DrawImage(imgPhoto,
                          new System.Drawing.Rectangle(0, 0, destWidth, destHeight),
                          new System.Drawing.Rectangle(0, 0, sourceWidth, sourceHeight),
                          GraphicsUnit.Pixel);
        bmPhoto.Save(@"D:\Scale.png", System.Drawing.Imaging.ImageFormat.Png);
        grPhoto.Dispose();
        return bmPhoto;
    }

But i get result "10g".

  1. How to force engine recognize only digits?
  2. How to get number 1013.

Upvotes: 5

Views: 15721

Answers (2)

Dainius Šaltenis
Dainius Šaltenis

Reputation: 1734

Strickos9 had shown you a partially great way to solve this issue. But the point is that if you will have to scan text with the same size, but also there would be some letters included, you will get a bad result. Also, even with whitelist related only to digits, you may expierence some problems while scanning (for example 5 scanned as 6), because Tesseract really struggles to scan a low quality characters, so I would highly recommend you to:

  • Enlarge the image by 2-4 times.
  • Do some blur if needed to soften the edges of chars.
  • Process it with 'threshold' or 'adaptive threshold' algorythms (to clear the blurred pixels and that blue color in the background).

I've answered a similar question HERE, where a person was also unsatisfied with results while scanning a low quality picture.

Combined with what Strickos9 offered to you (if you are going to scan only digits) should provide you a perfect quality of scanning.

You can do this image processing with software like OpenCV or Matlab (although I've never tried this). If you are struggling with this, post in comments your further questions.

Upvotes: 5

Strickos9
Strickos9

Reputation: 106

You can tell the Tesseract Engine to only look for digits by using the following code :

var  engine = new TesseractEngine(@"C:\Projects\tessdata", "eng", EngineMode.Default);
                engine.SetVariable("tessedit_char_whitelist", "0123456789");

Upvotes: 9

Related Questions