ThePumpkinMaster
ThePumpkinMaster

Reputation: 2351

How do I get text from a png in Nodejs?

I tried using tesseract-ocr on this image: http://ablazinradio.com/site/wp-content/uploads/2015/06/lebron-james-cavs.jpg but it doesn't return text with "Cavs" or "23", it returns nothing. Are there any other npm modules that would extract the text from that image, or can I do it manually somehow? Thanks.

Upvotes: 0

Views: 5771

Answers (2)

Himanshu Joshi
Himanshu Joshi

Reputation: 126

So, textract is the package that will help for nodejs project and tika for python. But issue with textract is that it required you need to install tools for OS like pdftotext(for pdf), antiword(for word docs), unrtf(for rtf), tesseract(for images), drawingtotext(for DXF files). This will work for traditional server where you know OS. But in cloud functions or lambda functions where you do not know OS and if possible still cost performance.

https://www.npmjs.com/package/textract

Upvotes: 0

Reece
Reece

Reputation: 764

I just ran this through tesseract, and I got absolute gibberish back.

Tesseract really isn't equipped to process that kind of image, especially without any pre-processing of the image.

I don't think you'll find anything open source that can deal with that image.

Maybe give the Google Vision APIs a go https://cloud.google.com/vision/docs/

Otherwise if you are willing to invest more time into tesseract I suggest looking at the tesseract wiki to try improve your results https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

Upvotes: 2

Related Questions