Reputation: 3442
I am using OCR to recognize digits on picture
var engine = new TesseractEngine(@"C:\Projects\tessdata", "eng", EngineMode.Default,);
var currentImage = TakeScreen();
var page = engine.Process(ScaleByPercent(currentImage, 500));
var text = page.GetText().Replace("\n", "");
Scale:
public Bitmap ScaleByPercent(Bitmap imgPhoto, int Percent)
{
float nPercent = ((float)Percent / 100);
int sourceWidth = imgPhoto.Width;
int sourceHeight = imgPhoto.Height;
var destWidth = (int)(sourceWidth * nPercent);
var destHeight = (int)(sourceHeight * nPercent);
var bmPhoto = new Bitmap(destWidth, destHeight,
PixelFormat.Format24bppRgb);
bmPhoto.SetResolution(imgPhoto.HorizontalResolution,
imgPhoto.VerticalResolution);
Graphics grPhoto = Graphics.FromImage(bmPhoto);
grPhoto.InterpolationMode = InterpolationMode.HighQualityBicubic;
grPhoto.DrawImage(imgPhoto,
new System.Drawing.Rectangle(0, 0, destWidth, destHeight),
new System.Drawing.Rectangle(0, 0, sourceWidth, sourceHeight),
GraphicsUnit.Pixel);
bmPhoto.Save(@"D:\Scale.png", System.Drawing.Imaging.ImageFormat.Png);
grPhoto.Dispose();
return bmPhoto;
}
But i get result "10g".
Upvotes: 5
Views: 15721
Reputation: 1734
Strickos9 had shown you a partially great way to solve this issue. But the point is that if you will have to scan text with the same size, but also there would be some letters included, you will get a bad result. Also, even with whitelist related only to digits, you may expierence some problems while scanning (for example 5 scanned as 6), because Tesseract really struggles to scan a low quality characters, so I would highly recommend you to:
I've answered a similar question HERE, where a person was also unsatisfied with results while scanning a low quality picture.
Combined with what Strickos9 offered to you (if you are going to scan only digits) should provide you a perfect quality of scanning.
You can do this image processing with software like OpenCV or Matlab (although I've never tried this). If you are struggling with this, post in comments your further questions.
Upvotes: 5
Reputation: 106
You can tell the Tesseract Engine to only look for digits by using the following code :
var engine = new TesseractEngine(@"C:\Projects\tessdata", "eng", EngineMode.Default);
engine.SetVariable("tessedit_char_whitelist", "0123456789");
Upvotes: 9