Reputation: 461
I am running a series of OCR on images using tess4j as a wrapper for tesseract from JAVA. The process of ocr is still taking a significant amount of time (even 5 seconds sometimes) and I am trying to speed it up.
I am doing my own preprocessing and binarization of the image and it is not necessary for tesseract to do the otsu binarization.
I have read a tutorial for IOS that allows skipping the graphical processing part , but i can't find anything using tess4j.
The turial here: https://github.com/gali8/Tesseract-OCR-iOS/wiki/Tips-for-Improving-OCR-Results -
"... if you've already performed your own pre-processing/thresholding [...] you will probably want to bypass the internal Tesseract thresholding step. "
Does anybody know how I could use tess4j (from JAVA) in a way that would skip the otsu binarization?
Upvotes: 3
Views: 2341
Reputation: 8345
Check tesseract-ocr parameters list for any settings applicable. But I read that if you send in a binarized image, Tesseract will skip the thresholding on the image (source).
Upvotes: 1