Austin Mckay
Austin Mckay

Reputation: 51

Data Entry Automation by Field Identification and Optical Character Recognition (OCR) for Handwriting on Predefined Forms

I'm looking to automate data entry from predefined forms that have been filled out by hand. The characters are not separated, but the fields are identifiable by lines underneath or as a part of a table. I know that handwriting OCR is still an area of active research, and I can include an operator review function, so I do not expect accuracy above 90%.

The first solution that I thought of is a combination of OpenCV for field identification (http://answers.opencv.org/question/63847/how-to-extract-tables-from-an-image/) and Tesseract to recognize the handwriting (https://github.com/openpaperwork/pyocr).

Another potentially simpler and more efficacious method for field identification with a predefined form would be to somehow subtract the blank form from the filled form. Since the forms would be scanned, this would likely require some location tolerance, noise reduction, and feature recognition.

Any suggestions or comments would be greatly appreciated.

Upvotes: 0

Views: 2345

Answers (1)

Dmitrii Z.
Dmitrii Z.

Reputation: 2357

As said in Tesseract FAQ it is not recommended to use if you're looking for a successful handwritten recognition. I would recommend you to look more into commercial projects like Microsoft OCR API (Scroll down to Read handwritten text from images), you can try it online and use their API in your application.

Another option is ABBYY OCR which has a lot of useful functions to recognize tables, complicated documents etc. You can read more here

As for free alternatives - the only think that comes to mind is Lipi toolkit

As for detection of letters - it really depends on the input, in general if your form is more or less same every time - it would be best to simply measure your form and use predefined positions in which you need to search for text. Otherwise OpenCV is a right technology to look for text, there are plenty of tutorials online and good answers here on stackoverflow, for example you can take a look at detection using MSER answer by Silencer.

Upvotes: 1

Related Questions