Slinky
Slinky

Reputation: 5832

Determine if PDF file has searchable text in PHP

We have hundreds of PDF files on a server. Some of them contain searchable text and others do not.

I was asked to find out which are searchable and which are not.

Does anybody know of a way to read in a bunch of PDFs and determine if that PDF document contains text that is searchable/selectable or if the pdf only contains non-selectable/searchable text which needs to be OCRd?

I don't even need to actually read in the text; I just need to be able to detect possibly by tags or keywords, something that suggests that there are fonts or something like that in the raw data.

Are there tags in a searchable PDF that make it easy to detect?

Thanks

Upvotes: 3

Views: 3027

Answers (1)

miah
miah

Reputation: 10433

You could modify this code(pdf2text) to suit your purposes, I believe. Or this answer might get you to the right spot as well.

Upvotes: 1

Related Questions