Peter
Peter

Reputation: 1

hOCR format for tesseract

I have a several html files in this format, tried to open them with glmagereader (tesseract). But it won't open with a .html or .hocr extension. Also I tried to add a line like:

<meta content="ocr_page ocr_table ocrx_block ocrx_word" name="ocr-capabilities"/>

but still does not work.

Example file:

<div data-coords="0 0 1499 2375" dir="auto" class="ocr_page">
<div class="ocrx_block">
<p class="ocr_par"><span class="ocr_line" data-width="0.223482321547698" data-line-break="true"><span data-coords="291 152 335 192" class="ocrx_word">16</span> </span><span class="ocr_line" data-width="0.654436290860574" data-line-break="true" data-px=""><span data-coords="651 146 832 205" class="ocrx_word">Ludewigs</span> <span data-coords="835 145 846 201" class="ocrx_word">-</span> <span data-coords="849 143 968 201" class="ocrx_word">Orden</span> <span data-coords="972 143 981 199" class="ocrx_word">.</span> </span></p>

Kind regards and thanks

Upvotes: 0

Views: 110

Answers (0)

Related Questions