EOB
EOB

Reputation: 3085

Read the content of a PDF with PHP?

I need to read certain parts from a complex PDF. I searched the net and some say FPDF is good, but it cant read PDF, it can only write. Is there a lib out there which allows to get certain content of a given PDF?

If not, whats a good way to read certain parts of a given PDF?

Thanks!

Upvotes: 5

Views: 14047

Answers (4)

Andreas
Andreas

Reputation: 2153

Nowadays there is also https://github.com/smalot/pdfparser:

use Smalot\PdfParser\Parser;

$pdfParser = new Parser();
$pdf = $pdfParser->parseFile('../path/to/your.pdf');

$content = $pdf->getText()

// or if you need to maintain the paragraphs
$content = preg_replace('/\s{3,}/m', "\n\n", trim($pdf->getText()));

Upvotes: 0

greut
greut

Reputation: 4363

I see two solutions here:

  • converting your PDF file into something else before: text, html.
  • using a library to do so and bad news here, most of them are written in Java.

https://whatisprymas.wordpress.com/2010/04/28/lucene-how-to-index-pdf-files/ (archived version from 2012)

Upvotes: 2

Stoufa
Stoufa

Reputation: 37

$result = pdf2text ('sample.pdf');
echo "<pre>$result</pre>";

How to get “clean” text :source code pdf2text
http://webcheatsheet.com/php/reading_clean_text_from_pdf.php

Upvotes: 1

kim pastro
kim pastro

Reputation: 61

What about that ?

http://www.phpclasses.org/package/702-PHP-Searches-pdf-documents-for-text.html

ps: I don't test this class, just read the description.

Upvotes: 0

Related Questions