Reputation: 91
When I open a PDF in a PDF viewer, I see a series of bookmarks on the left side of the actual document. The information shown there doesn't seem to make part of the actual content of the document: it isn't printed, it's not present on a specific page.
How can I extract these bookmarks using Java?
Upvotes: 1
Views: 5015
Reputation: 2185
To retrieve the bookmark content from a PDF file with Java you can use the pCOS interface of PDFlib+PDI 9. A sample code is included in the pCOS Cookbook: http://www.pdflib.com/en/pcos-cookbook/interactive-elements/bookmarks/
Upvotes: 2
Reputation: 90213
The OP question asked for a solution with Java.
However, this is may be a topic of more general interest to people who have to handle PDFs. So my answer offers a command line solution: mutool
.
mutool
is a command line utility bundled with the MuPDF viewer software, written by the same company which gave us Ghostscript.
Its latest version includes the show
sub-command, which can be used to print outlines
(that is in PDF technical parlance what the OP and the Adobe UI call "bookmarks"), amongst other specific items of interest from a PDF:
$ mutool show PDF32000_2008.pdf outlines
Document management — Portable document format — Part 1: PDF 1.7 1
Contents Page 3
Foreword 6
Introduction 7
1 Scope 9
2 Conformance 9
2.1 General 9
2.2 Conforming readers 9
2.3 Conforming writers 9
2.4 Conforming products 10
3 Normative references 10
4 Terms and definitions 14
5 Notation 18
6 Version Designations 18
7 Syntax 19
7.1 General 19
7.2 Lexical Conventions 19
7.2.1 General 19
7.2.2 Character Set 20
7.2.3 Comments 21
[....]
(Output was shortened.) The original PDF document (the official PDF-1.7 specification), contains this page as the ToC:
You can clearly see, how the /Outlines
contents are different (but similar) to the included table of contents page.
Here is how the outlines ("bookmarks") are displayed in Adobe Reader XI:
Upvotes: 5
Reputation: 77528
Please download the free ebook The Best iText Questions on StackOverflow. In that book, you'll find the answer to many questions, including to the question Reading PDF Bookmarks in VB.NET using iTextSharp
The coolest way to extract bookmarks, is by creating an XML file that shows the bookmarks in a nice hierarchical way:
PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
new FileOutputStream(dest), "ISO8859-1", true);
reader.close();
Upvotes: 3