Reputation:
I am looking for an OCR software that renders overlay HTML to an image. I am currently using some unnamed product. It has an OCR function, which will do an inline OCR of a PDF document with images.
The inline OCR is very handy, it allows to search the PDF document with images for text. Also text can be directly highlighted in the document, the OCR text is aligned with the underlying image. Unfortunately I can neiter export nor store the inline OCR from within the unnamed product.
Is there some other software around which can perform and export an inline OCR? I would be especially interested in exporting into an HTML consisting of positioned paragraphs which are aligned with the underlying image.
Upvotes: 6
Views: 2392
Reputation: 11668
I've found the Google Drive API to be helpful when requiring OCR. It attempts to preserve the format of the document which of course can be exported as HTML.
Take a look at the following links:
Upvotes: 4
Reputation: 155
I have a possible solution for you. But, this particular solution has some drawbacks, that might hinder you end goal.
First convert image file at to pdf : http://finereader.abbyyonline.com Then convert the pdf to html at http://document.online-convert.com/convert-to-html
This solution works for things about the size of paper, and the final result has the html with the image overlay, If all you want is the html with the image formatting just make the images fully transparent.
Upvotes: 3