Reputation: 11762
I am searching for a command line tool to add bookmarks to a PDF file.
What I have is a page number
and a label
. Would love to create bookmark called label
linking to page page number
.
Does any one know a command line tool (preferably OSX) for doing this?
I have about 4000 pages PDF files and about 150 bookmarks and would love to automate it.
My plan is to use a system call within a r-script.
EDIT
I create about 4000 single PDF files with graphs and I am using the OSX system command /System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py
to join the PDFs together. Previously I was using pdfjoin
from pdfjam
package, but this was way too slow. In the end, this is how I get my PDF where I add the bookmarks by hand with Adobe Acrobat Professional at the moment.
Upvotes: 25
Views: 16881
Reputation: 788
Another way to do this is to use cpdf
.
Create the table of contents in a text file following the following format per-entry: level "title" page <open>
, where:
For instance:
0 "Introduction" 5 open
1 "History" 6
0 "Main Content" 10
1 "Sub-topic 1" 11
Then, add the table of contents:
cpdf -add-bookmarks toc_cpdf.txt in.pdf -o out.pdf
This takes in.pdf
, uses toc.txt
for bookmarks, and outputs the indexed PDF to out.pdf
. Now, out.pdf
will have bookmarks according to the provided table of contents.
For more details on cpdf
, consider the cpdf
usage examples.
Upvotes: 4
Reputation: 2729
Here's the python method for adding Bookmarks to the Table of Contents. Runs on MacOS without any other installations.
#!/usr/bin/python
from Foundation import NSURL, NSString
import Quartz as Quartz
import sys
# You will need to change these filepaths to a local test pdf and an output file.
infile = "/path/to/file.pdf"
outfile = "/path/to/output.pdf"
def getOutline(page, label):
# Create Destination
myPage = myPDF.pageAtIndex_(page)
pageSize = myPage.boundsForBox_(Quartz.kCGPDFMediaBox)
x = 0
y = Quartz.CGRectGetMaxY(pageSize)
pagePoint = Quartz.CGPointMake(x,y)
myDestination = Quartz.PDFDestination.alloc().initWithPage_atPoint_(myPage, pagePoint)
myLabel = NSString.stringWithString_(label)
myOutline = Quartz.PDFOutline.alloc().init()
myOutline.setLabel_(myLabel)
myOutline.setDestination_(myDestination)
return myOutline
pdfURL = NSURL.fileURLWithPath_(infile)
myPDF = Quartz.PDFDocument.alloc().initWithURL_(pdfURL)
if myPDF:
# Here's where you list your page index (starts at 0) and label.
outline1 = getOutline(0, 'Page 1')
outline2 = getOutline(1, 'Page 2')
outline3 = getOutline(2, 'Page 3')
# Create a root Outline and add each outline. (Needs a loop.)
rootOutline = Quartz.PDFOutline.alloc().init()
rootOutline.insertChild_atIndex_(outline1, 0)
rootOutline.insertChild_atIndex_(outline2, 1)
rootOutline.insertChild_atIndex_(outline3, 2)
myPDF.setOutlineRoot_(rootOutline)
myPDF.writeToFile_(outfile)
Upvotes: 4
Reputation: 90253
You can also use pdftk
. It is also available for OS X.
I'm not going through all the details here and now, because it's been done elsewhere at great length already. Just briefly:
Run one of these commands:
pdftk my.pdf dump_data output -
pdftk my.pdf dump_data output bookmarks+otherdata.txt
Study the format of the output.
Run PDFTK again:
pdftk my.pdf update_info bookmarks.txt output bookmarked.pdf
Additional Information
This is the Bookmark format I noticed after inspecting in Step 4 above.
BookmarkBegin
BookmarkTitle: -- Your Title 1 --
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: -- Your Title 2 --
BookmarkLevel: 1
BookmarkPageNumber: 2
BookmarkBegin
BookmarkTitle: -- Your Title 3 --
...
...
and so on...
And replace the above.. in the appropriate place.
Upvotes: 30
Reputation: 90253
Here is another answer. This one uses Ghostscript to process PDF-to-PDF and the pdfmark
PostScript operator to insert the bookmarks.
For some introduction to the pdfmark topic, see also:
This method involves two steps:
pdfmark
commands, one per line and bookmark you want to add.The content on the text file should look something like this:
[/Page 1 /View [/XYZ null null null] /Title (This is page 1) /OUT pdfmark
[/Page 2 /View [/XYZ null null null] /Title (Dunno which page this is....) /OUT pdfmark
[/Page 3 /View [/XYZ null null null] /Title (Some other name) /OUT pdfmark
[/Page 4 /View [/XYZ null null null] /Title (File 4) /OUT pdfmark
[/Page 5 /View [/XYZ null null null] /Title (File 5) /OUT pdfmark
[/Page 6 /View [/XYZ null null null] /Title (File 6) /OUT pdfmark
[/Page 7 /View [/XYZ null null null] /Title (File 7) /OUT pdfmark
% more lines for more pages to bookmark...
[/Page 13 /View [/XYZ null null null] /Title (File 13) /OUT pdfmark
[/Page 14 /View [/XYZ null null null] /Title (Bookmark for page 14) /OUT pdfmark
% more lines for more pages to bookmark...
Name this file for example: addmybookmarks.txt
Now run this command:
gs -o bookmarked.pdf \
-sDEVICE=pdfwrite \
addmybookmarks.txt \
-f original.pdf
The resulting PDF, bookmarked.pdf
now contains the bookmarks. See this screenshot:
Upvotes: 12
Reputation: 90253
Ok, here is a quick'n'dirty way to do three jobs at once:
It involves using a LaTeX installation.
You start with an empty LaTeX template like the following one:
\documentclass[]{article}
\usepackage{pdfpages}
\usepackage{hyperref}
\hypersetup{breaklinks=true,
bookmarks=true,
pdfauthor={},
pdftitle={},
colorlinks=true,
citecolor=blue,
urlcolor=blue,
linkcolor=magenta,
pdfborder={0 0 0}}
\begin{document}
{
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{3}
% Comment next line in or out if you want a ToC or not:
\tableofcontents
}
%% Here goes your additional code:
%% 1 line per included PDF!
\end{document}
Now just before the last line of this template, you insert one line per external PDF file you want to include.
In case you want to generate a ToC, it has to be formatted like this:
\includepdf[pages={<pagenumber>},addtotoc{<pagenumber>,<section>,<level>,\
<heading>,<label>}]{pdffilename.pdf}
In case you are sure that each and every included PDF is a 1-page document, it simplifies to this:
\includepdf[addtotoc{<pagenumber>,<section>,<level>,\
<heading>,<label>}]]{pdffilename.pdf}
Here all of the following five parameters for addtotoc
are required, in the order given for the files to appear in the bookmarks and in the ToC. See further below for a specific example:
<pagenumber>
: Number of the page of inserted document to be linked to. (In your case always "1", because you insert 1-page documents only; you could insert a 5-page document and link to page 3 of the inserted PDF, though).<section>
: The LaTeX sectioning name. Could be section
, subsection
, subsubsection
... In your case "section".<level>
: The level of the LaTeX section. In your case "1".<heading>
: This is a string. Used for the text of the bookmark<label>
: This must be unique for each bookmark. Used in the PDF internally to jump to correct page when bookmark is clicked.To test this quickly, I used Ghostscript to generate 20 1-page PDF documents:
for i in {1..20}; do
gs -o p${i}.pdf -sDEVICE=pdfwrite \
-c "/Helvetica findfont 30 scalefont setfont \
100 600 moveto \
(Page ${i}) show \
showpage";
done
With these test files I could make the lines to insert into the template look like these:
\includepdf[addtotoc={1,section,1,Page 1 (First),p1}]{p1.pdf}
\includepdf[addtotoc={1,section,1,Page 2,p2}]{p2.pdf}
\includepdf[addtotoc={1,section,1,Page 3,p3}]{p3.pdf}
[...]
\includepdf[addtotoc={1,section,1,Page 11 (In the Middle),p11}]{p11.pdf}
[...]
\includepdf[addtotoc={1,section,1,Page 20 (Last),p20}]{p20.pdf}
Save the template with the inserted lines, then run the following command twice:
pdflatex template.tex
pdflatex template.tex
The resulting file will have the bookmarks, looking like this in Preview.app:
Note: LaTeX is available for OSX via two methods:
I'll add one or two other methods to insert bookmarks on the command line too, later or in the next few days, if I have more time.
For now this one has to do, because I never showed it here on SO, AFAICR.
But I thought because you gave the background "I'm merging 1-page PDFs, and it is slow; now I want to add bookmarks too...", I could show how to do it with one single method.
HINT : One of the other methods will be to use pdftk
which IS available for Mac OS X!
Upvotes: 3