user502187
user502187

Reputation:

OCR that delivers overlay HTML to an Image?

I am looking for an OCR software that renders overlay HTML to an image. I am currently using some unnamed product. It has an OCR function, which will do an inline OCR of a PDF document with images.

The inline OCR is very handy, it allows to search the PDF document with images for text. Also text can be directly highlighted in the document, the OCR text is aligned with the underlying image. Unfortunately I can neiter export nor store the inline OCR from within the unnamed product.

Is there some other software around which can perform and export an inline OCR? I would be especially interested in exporting into an HTML consisting of positioned paragraphs which are aligned with the underlying image.

See also:
https://stackoverflow.com/questions/11404805/ocr-and-the-location-of-the-image-where-the-scanned-document-came-from

Upvotes: 6

Views: 2392

Answers (2)

Marc Greenstock
Marc Greenstock

Reputation: 11668

I've found the Google Drive API to be helpful when requiring OCR. It attempts to preserve the format of the document which of course can be exported as HTML.

Take a look at the following links:

Upvotes: 4

user2503170
user2503170

Reputation: 155

I have a possible solution for you. But, this particular solution has some drawbacks, that might hinder you end goal.

First convert image file at to pdf : http://finereader.abbyyonline.com Then convert the pdf to html at http://document.online-convert.com/convert-to-html

This solution works for things about the size of paper, and the final result has the html with the image overlay, If all you want is the html with the image formatting just make the images fully transparent.

Upvotes: 3

Related Questions