Fluffy
Fluffy

Reputation: 28402

How to open PDF and read it?

how can I open a PDF file and read some of it's contents with Python (this language is preferred, however Ruby, Perl or PHP are fine too) (in case it is recognized (not just an image)) or report that it's impossible without OCR? TIA

Update: thanks for the solutions, I'm sure some of them will suit me fine.

@RichH, I have a pdf file, and don't know whether it is image- or text-based. I'm looking for a tool to help me find that out and in case it's text-based extract some of it's contents.

Upvotes: 2

Views: 2030

Answers (2)

Ether
Ether

Reputation: 54004

For Perl, check out these modules:

Upvotes: 5

johannes
johannes

Reputation: 15989

Parsing PDF and making something useful out of it is hard as the format is focused on keeping the layout so text can be stored in a way that each letter is positioned individually, depending on the font the text might also be stored as graphic.

libraries to read PDFs I know include the Zend Framework which has a PDF component which includes a PDF parser which can be used from PHP and gives more or less usaable results and the commercial PDFlib which offers quite usable results and offers binding to different languages.

Upvotes: 1

Related Questions