Tesseract, OCR and text based layout

Question

I'm trying to build a small application (C#) that can OCR process some images, extracting the raw text with layout roughly intact (using tabs, spaces or whatever, to position the text in the output according to the original layout).

I found this post from 2018 Tesseract OCR Text Position but was wondering if anyone know if something has happened to Tesseract since then to better achieve this goal?

Alternatively it doesn't have to be Tesseract, but from when I can find, it seems to be the go to package if your project doesn't call for a commercial solution.

Tesseract, OCR and text based layout

Answers (0)

Related Questions