Reputation: 33
I have a translation website and I would like to parse PDF files so that I can count words and I set the price for translation.
I have tried Poppler JS before. But It can't handle the scanned files. How should I handle them?
For example this PDF is a scanned article. It is a PDF file but each page is a picture and I need to extract the text:
Upvotes: 3
Views: 5447
Reputation: 1847
What you are looking for is an OCR library. There are a bunch of options to do this, here are some Software Recommendation Stack Exchange links:
Scan Text Document To PDF With OCR
Upvotes: 1