Christian
Christian

Reputation: 25

Identifying Bookmarks using Python

I was looking into PyPDF2 in order to read bookmarks off a pdf.

Can anyone point me in the right direction as to how to read bookmarks off a pdf and then split the pdf base on it. I am pretty sure I can figure how to split once I know how to identify the bookmarks.

Thanks

Upvotes: 1

Views: 3107

Answers (2)

Ahaha
Ahaha

Reputation: 426

It took me quite a while to figure this out, so I put my answer here as it may help others.

The outlines contains a nested list of Destinations (Definition of Destination Class)

And you can get the pdf outline using:

from PyPDF2 import PdfFileReader

reader = PdfFileReader(pdf)
reader.outlines

For each heading with child headings, the parent heading is in a Destination object followed by a list of child headings as a list of Destination objects.

parent_destination
[child_destination1, child_destination2, ......]

If it has no child headings then it will be followed by a sibling Destination, rather than a list.

destination1
destination2

Each Destination contains

  • title: the text content of a heading
  • page: page number
  • other properties

which can be used to split the pdf.

Hope this helps.

Upvotes: 1

Leighner
Leighner

Reputation: 193

It looks like PyPDF2 has the functionality you need. You might find what you need this post

Upvotes: 0

Related Questions