Reputation: 1
I have a several html files in this format, tried to open them with glmagereader (tesseract). But it won't open with a .html
or .hocr
extension. Also I tried to add a line like:
<meta content="ocr_page ocr_table ocrx_block ocrx_word" name="ocr-capabilities"/>
but still does not work.
Example file:
<div data-coords="0 0 1499 2375" dir="auto" class="ocr_page">
<div class="ocrx_block">
<p class="ocr_par"><span class="ocr_line" data-width="0.223482321547698" data-line-break="true"><span data-coords="291 152 335 192" class="ocrx_word">16</span> </span><span class="ocr_line" data-width="0.654436290860574" data-line-break="true" data-px=""><span data-coords="651 146 832 205" class="ocrx_word">Ludewigs</span> <span data-coords="835 145 846 201" class="ocrx_word">-</span> <span data-coords="849 143 968 201" class="ocrx_word">Orden</span> <span data-coords="972 143 981 199" class="ocrx_word">.</span> </span></p>
Kind regards and thanks
Upvotes: 0
Views: 110