Emil
Emil

Reputation: 13789

Generate font from an image of text

Is it possible to generate a specific set of font from the below given image ?

My idea is to generate a specific font for the below given image of text ,by manually selecting portion of the image and mapping it to a set of letter's.Generate the font for this and then use this font to make it readable for an OCR.Is generation of font possible using any open-source implementation ? Also please suggest any good OCR's.

alt text

Upvotes: 1

Views: 2453

Answers (1)

Andrew Cash
Andrew Cash

Reputation: 2394

Abbyy FineReader 10 gets better than expected results but predictably gets confused when the characters touch.

Your problem is that the line spacing is too small. The descenders of each line overlap the character bounding boxes of the characters in the line directly below. This makes character segmentation almost impossible because the characters are touching and overlapping. The number of combinations of overlapping characters is virtually impossible to train for. The 'g' and 'y' characters are the worst offenders.

A double line spaced version of this would probably OCR reasonably well.

A custom solution that segmented and separated the each line along with a good dictionary would definitely improve the results. There would still be some errors to correct manually though. The custom routine would have to deal with the ascenders and descenders and try and segment the image into lines which can then be fed to a decent OCR engine. One way would be to analyse every character blob on the page and allocate it to a line. Leptonica (www.leptonica.com - C Imaging Library) would probably make this job a little easier.

I would not try this without increasing the resolution to 200 or 300 dpi first.

With this custom solution, training a font becomes an option if the OCR engine does a poor job initially.

Abbyy (www.abbyy.com) or Google Tesseract OCR 3.00 would be a good place to start.

No guarantees as to whether all of this will work though. This is quite a difficult page to OCR and you need to work out whether it is better to have it typed up manually overseas. It depends on the number of pages to need to process.

Upvotes: 1

Related Questions