Reputation:
In the image format have full of the text. (ie) the scanned document in the format of image file *.tiff. Optical character recognize method only the Normal format of alphabet. In this image format contains the text like running letter. so how to identify and convert the text in to text files?
Upvotes: 2
Views: 678
Reputation: 93
With tesseract-ocr you can train for the characters. If you are sure with running letter font you can use those samples as the training data instead of the default one which ships with it. I haven t done with running letter, but this library is a good place to start with.
http://code.google.com/p/tesseract-ocr/
Regards, Prasanna.
Upvotes: 1