npiv
npiv

Reputation: 1837

Read from a searchable pdf, without ocr

I'm currently using my scanner to turn my PDFs into searchable PDFs. The OCR is already taken care of, since I can use ctrl-f within the PDF.

How can I get at the OCR'd content from my program though.

I'm open to using java, ruby, the question is kind of programming language agnostic. Is the OCR'd text openly accessible by reading the file?

Upvotes: 0

Views: 300

Answers (1)

Mike Fahy
Mike Fahy

Reputation: 5707

Not sure how your OCR software creates the PDF, but could you use a third-party library (jPedal) or tool such as iText or XPDF to extract the text from the resulting PDF?

Upvotes: 1

Related Questions