Joe
Joe

Reputation: 13

Windows Tesseract OCR getting scattered HOCR out put instead of clean standard format

A quick help is highly appreciated. I am extracting the text from the tiff image through tesseract-OCR. The output I am looking for is.HOCR (HTML). I am getting the perfect output in terms of content, but the format looks very unorganized. But the same when I open with Notepad ++ it gives a clean format.

The windows command line is given below

Tesseract "Path\image.tiff" "Path\output" HOCR

need your help in getting the organised hocr format in notepad as enclosed

How do I get organized hocr output format in notepad data when I open with notepad? The present output in Notepad the display of unorganised data in notepad ++

Upvotes: 0

Views: 728

Answers (1)

user898678
user898678

Reputation: 3328

Problem is not in tesseract, but in notepad. Use some normal text editor like notepad++ or context.

Upvotes: 0

Related Questions