Guimby
Guimby

Reputation: 31

Angular + Tesseract.js (and opencv.js)

I currently try to use tesseract.js in angular, to perform some recognition on images that have previously been modified in opencv.js.

Image manipulation via opencv.js is working really great now, but I can't figure whats wrong with my differents tries with tesseract.js...

When I follow some tutorials on the web, it works great and I can perform OCR on the default example image, for example (only the revelant part)

     const exampleImage = 'https://tesseract.projectnaptha.com/img/eng_bw.png';

        const worker = Tesseract.createWorker({
          logger: m => console.log(m)
        });

        Tesseract.setLogging(true);
        work();

        async function work() {
          await worker.load();
          await worker.loadLanguage('eng');
          await worker.initialize('eng');

          let result = await worker.detect(exampleImage);
          console.log(result.data);

          await worker.terminate();
        }

But, when I try to do the same with a previously processed image (via opencv.js), with an cv.Mat() image, or via the resulting html canvas... I always get the same error:

tesseract.js error : TypeError: Cannot read property 'SetImage' of null

I also get this error : Error in pixReadMem: size < 12

I don't really understand what I'm doing wrong, and I believe that my error can be in the way I give the picture to tesseract... But every way that I've tried didn't work, so here I am to ask for your help.

Example of code not working :

    const worker = Tesseract.createWorker({
          logger: m => console.log(m)
        });

        Tesseract.setLogging(true);
        work(onlyDocument);

        async function work(d) {
          await worker.load();

          const ctx = document.getElementById('result').getContext('2d');

          const buffer = ctx.getImageData(0, 0, ctx.canvas.width, ctx.canvas.height).data.buffer;

          const result2 = await worker.detect(buffer);

          console.log(result2.data);

          await worker.terminate();
        }

I must precise that every I tried every format that I could think to give that image to tesseract.js (buffer, the canvas, array, ...)

Upvotes: 1

Views: 1707

Answers (1)

Partho Ghosh
Partho Ghosh

Reputation: 14

You would need to initialize the Tesseract API before performing any OCR tasks. This would resolve the following error.

tesseract.js error : TypeError: Cannot read property 'SetImage' of null

Solution:

 //Your async function
 async function work(d) {
   await worker.load();
   await worker.loadLanguage('eng');
   await worker.initialize('eng');
   //language choice (e.g.: 'eng') based on trained data available
   //Image like input can now be given to recognize(), detect() methods
   ...
   await worker.terminate();
 }

After initialization, as long as the input to API is image-like, it should work regardless of whether the image is pre-processed/ unprocessed. Hope this solves your query.

P.S.: The tutorial sample had the API initialized and hence no errors were thrown.

Upvotes: 0

Related Questions