Reputation: 1617
I am trying to use Tesseract.js for OCR, but I'm not able to get the 'preserve_interword_spaces' option to work. Here is what I am trying:
Tesseract.recognize(
element.files[0],
'eng',
{ preserve_interword_spaces: 1,
logger: progress => {
console.log(progress);
progressBar.querySelector("div").innerText = progress.status;
progressBar.querySelector("progress").value = progress.progress;
} }
).then( //etc )
The OCR is coming out with multiple spaces combined into one. Help?
I'd prefer to define the .recognize() this way, rather than using await(). I know preserve_interword_spaces is supported since I can see it in the documentation here and here but I'm not sure how to get it to work in my case.
Upvotes: 1
Views: 704
Reputation: 1617
Just an update that I was able to resolve the issue by changing to async(). As the documentation states, Tesseract.recognize() is only meant for quick tasks, not more involved ones.
Upvotes: 0