What is the best way to Parse a scanned PDF file using PHP or JS?

Question

I have a translation website and I would like to parse PDF files so that I can count words and I set the price for translation.

I have tried Poppler JS before. But It can't handle the scanned files. How should I handle them?

For example this PDF is a scanned article. It is a PDF file but each page is a picture and I need to extract the text:

hcham1 · Accepted Answer

What you are looking for is an OCR library. There are a bunch of options to do this, here are some Software Recommendation Stack Exchange links:

Scan Text Document To PDF With OCR

Answers (1)