Windows Tesseract OCR getting scattered HOCR out put instead of clean standard format

Question

A quick help is highly appreciated. I am extracting the text from the tiff image through tesseract-OCR. The output I am looking for is.HOCR (HTML). I am getting the perfect output in terms of content, but the format looks very unorganized. But the same when I open with Notepad ++ it gives a clean format.

The windows command line is given below

Tesseract "Path\image.tiff" "Path\output" HOCR

need your help in getting the organised hocr format in notepad as enclosed

How do I get organized hocr data when I open with notepad?

Windows Tesseract OCR getting scattered HOCR out put instead of clean standard format

Answers (1)

Related Questions