Reputation: 13
A quick help is highly appreciated. I am extracting the text from the tiff image through tesseract-OCR. The output I am looking for is.HOCR (HTML). I am getting the perfect output in terms of content, but the format looks very unorganized. But the same when I open with Notepad ++ it gives a clean format.
The windows command line is given below
Tesseract "Path\image.tiff" "Path\output" HOCR
need your help in getting the organised hocr format in notepad as enclosed
How do I get organized hocr data when I open with notepad?
Upvotes: 0
Views: 728
Reputation: 3328
Problem is not in tesseract, but in notepad. Use some normal text editor like notepad++ or context.
Upvotes: 0