Tayfun Yazici
Tayfun Yazici

Reputation: 33

What is the best way to Parse a scanned PDF file using PHP or JS?

I have a translation website and I would like to parse PDF files so that I can count words and I set the price for translation.

I have tried Poppler JS before. But It can't handle the scanned files. How should I handle them?

For example this PDF is a scanned article. It is a PDF file but each page is a picture and I need to extract the text:

enter image description here

Upvotes: 3

Views: 5447

Answers (1)

hcham1
hcham1

Reputation: 1847

What you are looking for is an OCR library. There are a bunch of options to do this, here are some Software Recommendation Stack Exchange links:

Scan Text Document To PDF With OCR

JavaScript library for OCR

Upvotes: 1

Related Questions