Reputation: 3085
I need to read certain parts from a complex PDF. I searched the net and some say FPDF is good, but it cant read PDF, it can only write. Is there a lib out there which allows to get certain content of a given PDF?
If not, whats a good way to read certain parts of a given PDF?
Thanks!
Upvotes: 5
Views: 14047
Reputation: 2153
Nowadays there is also https://github.com/smalot/pdfparser:
use Smalot\PdfParser\Parser;
$pdfParser = new Parser();
$pdf = $pdfParser->parseFile('../path/to/your.pdf');
$content = $pdf->getText()
// or if you need to maintain the paragraphs
$content = preg_replace('/\s{3,}/m', "\n\n", trim($pdf->getText()));
Upvotes: 0
Reputation: 4363
I see two solutions here:
https://whatisprymas.wordpress.com/2010/04/28/lucene-how-to-index-pdf-files/ (archived version from 2012)
Upvotes: 2
Reputation: 37
$result = pdf2text ('sample.pdf');
echo "<pre>$result</pre>";
How to get “clean” text :source code pdf2text
http://webcheatsheet.com/php/reading_clean_text_from_pdf.php
Upvotes: 1
Reputation: 61
What about that ?
http://www.phpclasses.org/package/702-PHP-Searches-pdf-documents-for-text.html
ps: I don't test this class, just read the description.
Upvotes: 0