Reputation: 25
I was looking into PyPDF2 in order to read bookmarks off a pdf.
Can anyone point me in the right direction as to how to read bookmarks off a pdf and then split the pdf base on it. I am pretty sure I can figure how to split once I know how to identify the bookmarks.
Thanks
Upvotes: 1
Views: 3107
Reputation: 426
It took me quite a while to figure this out, so I put my answer here as it may help others.
The outlines contains a nested list of Destinations (Definition of Destination Class)
And you can get the pdf outline using:
from PyPDF2 import PdfFileReader
reader = PdfFileReader(pdf)
reader.outlines
For each heading with child headings, the parent heading is in a Destination object followed by a list of child headings as a list of Destination objects.
parent_destination
[child_destination1, child_destination2, ......]
If it has no child headings then it will be followed by a sibling Destination, rather than a list.
destination1
destination2
Each Destination contains
which can be used to split the pdf.
Hope this helps.
Upvotes: 1