Henrique Ferreira
Henrique Ferreira

Reputation: 143

How is it possible to extract bookmarks from a PDF File in PHP using Smalot/PDFParser?

Right now I'm working with PHP and Laravel. My objective is to extract the most information possible out of an uploaded PDF file (using a Form and POST method) such as metadata (author, title, etc.), first page (cover), content of each page and the available chapters (from the bookmarks).

I'm currently using smalot's PDF Parser available here but the documentation only covers some basic examples of what I already got from the PDF file.

Question: My current problem is extracting these bookmarks in order to fullfil the chapter's requirement. Does anyone know how to extract this type of content using this specific parser?

My code at the moment looks like this:

<table>
    <?php
        $details  = $PDFfile->getDetails();
        // Loop over each property to extract values (string or array).
        foreach ($details as $property => $value) {
            if (is_array($value)) {
                $value = implode(', ', $value);
            }
            echo '<tr>';
            echo '<td><b>'.$property . '</b></td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td><td>' . $value . "</td>";
            echo '</tr>';
        }
    ?>
</table>

Note that this only produces an output that looks as this:

[Producer] => dvips + GNU Ghostscript 7.05
[Creator] => LaTeX with hyperref package
[Title] => 
[Subject] => 
[Author] => 
[Keywords] => 
[Pages] => 11

Upvotes: 0

Views: 1004

Answers (1)

Patrick Gallot
Patrick Gallot

Reputation: 625

I've no experience with Smalot, but I do have some experience with extract information from PDF bookmarks. So looking at section 12.3.3 of the PDF reference and the smalot documentation, I would start from Document getDictionary() and get the 'Outlines' entry from that dictionary, and then walk the tree, looking at the First,Next, Title, Last, and Count entries.

Upvotes: 0

Related Questions