garson
garson

Reputation: 1617

Using 'preserve_interword_spaces' in tesseract.js

I am trying to use Tesseract.js for OCR, but I'm not able to get the 'preserve_interword_spaces' option to work. Here is what I am trying:

 Tesseract.recognize(
      element.files[0],
      'eng',
        { preserve_interword_spaces: 1,
          logger: progress => {
            console.log(progress);
            progressBar.querySelector("div").innerText = progress.status;
            progressBar.querySelector("progress").value = progress.progress;
        } }
    ).then( //etc )

The OCR is coming out with multiple spaces combined into one. Help?

I'd prefer to define the .recognize() this way, rather than using await(). I know preserve_interword_spaces is supported since I can see it in the documentation here and here but I'm not sure how to get it to work in my case.

Upvotes: 1

Views: 704

Answers (1)

garson
garson

Reputation: 1617

Just an update that I was able to resolve the issue by changing to async(). As the documentation states, Tesseract.recognize() is only meant for quick tasks, not more involved ones.

Upvotes: 0

Related Questions