Reputation: 8222
I am writing a program that when given an image of a low level math problem (e.g. 98*13) should be able to output the answer. The numbers would be black, and the background white. Not a captcha, just an image of a math problem.
The math problems would only have two numbers and one operator, and that operator would only be +, -, *, or /.
Obviously, I know how to do the calculating ;) I'm just not sure how to go about getting the text from the image.
A free library would be ideal... although If I have to write the code myself I could probably manage.
Upvotes: 12
Views: 55377
Reputation: 494
IronOCR is free for development and testing. The default English language pack should do a good job of reading this, but you may also want to consider using a custom Tesseract language pack written specifically for equations.
See https://ironsoftware.com/csharp/ocr/languages/#custom-language-example
using IronOcr;
var Ocr = new IronTesseract();
Ocr.UseCustomTesseractLanguageFile("languages/equ.traineddata");
using (var Input = new OcrInput(@"images\equation.png"))
{
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
Disclaimer: I work for Iron Software.
Upvotes: 1
Reputation: 260
For extract words from image, I use the most accurate open source OCR engine: Tesseract. Available here or directly in your packages NuGet.
And this is my function in C#, which extract words from image passed in sourceFilePath
. Set EngineMode to TesseractAndCube; it detect more word than the other options.
var path = "YourSolutionDirectoryPath";
using (var engine = new TesseractEngine(path + Path.DirectorySeparatorChar + "tessdata", "fra", EngineMode.TesseractAndCube))
{
using (var img = Pix.LoadFromFile(sourceFilePath))
{
using (var page = engine.Process(img))
{
var text = page.GetText();
// text variable contains a string with all words found
}
}
}
I hope that helps.
Upvotes: 11
Reputation: 31
you can use Microsoft Office Document Imaging (Interop.MODI.dll) in visaul studio and extract text of pictures
Document modiDocument = new Document();
modiDocument.Create(filePath);
modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH);
MODI.Image modiImage = (modiDocument.Images[0] as MODI.Image);
string extractedText = modiImage.Layout.Text;
modiDocument.Close();
return extractedText;
Upvotes: 3
Reputation: 89232
You need OCR. There is the free Tesseract library from Google, but it's C code. You could use in a C++/CLI project and access via .NET.
This article gives some information on recognizing numbers (for Sudoku, but your problem is similar)
http://sudokugrab.blogspot.com/2009/07/how-does-it-all-work.html
Upvotes: 3
Reputation: 8017
Try this post regarding using the C++ Google Tessaract OCR lib in C#
OCR with the Tesseract interface
Upvotes: 6