drmariod
drmariod

Reputation: 11762

Create bookmarks into a PDF file via command line

I am searching for a command line tool to add bookmarks to a PDF file.

What I have is a page number and a label. Would love to create bookmark called label linking to page page number.

Does any one know a command line tool (preferably OSX) for doing this?

I have about 4000 pages PDF files and about 150 bookmarks and would love to automate it.

My plan is to use a system call within a r-script.

EDIT

I create about 4000 single PDF files with graphs and I am using the OSX system command /System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py to join the PDFs together. Previously I was using pdfjoin from pdfjam package, but this was way too slow. In the end, this is how I get my PDF where I add the bookmarks by hand with Adobe Acrobat Professional at the moment.

Upvotes: 25

Views: 16881

Answers (5)

HaoZeke
HaoZeke

Reputation: 788

Another way to do this is to use cpdf.

Create the table of contents in a text file following the following format per-entry: level "title" page <open>, where:

  • level is 0 for main chapters, 1 for sub-chapters, etc.
  • title is the chapter or sub-chapter name.
  • page is the page number.
  • open is optional, denoting whether sections with subsections should be uncollapsed by default.

For instance:

0 "Introduction" 5 open
1 "History" 6
0 "Main Content" 10
1 "Sub-topic 1" 11

Then, add the table of contents:

cpdf -add-bookmarks toc_cpdf.txt in.pdf -o out.pdf

This takes in.pdf, uses toc.txt for bookmarks, and outputs the indexed PDF to out.pdf. Now, out.pdf will have bookmarks according to the provided table of contents.

For more details on cpdf, consider the cpdf usage examples.

Upvotes: 4

benwiggy
benwiggy

Reputation: 2729

Here's the python method for adding Bookmarks to the Table of Contents. Runs on MacOS without any other installations.

#!/usr/bin/python    
from Foundation import  NSURL, NSString
import Quartz as Quartz
import sys

# You will need to change these filepaths to a local test pdf and an output file.
infile = "/path/to/file.pdf"
outfile = "/path/to/output.pdf"

def getOutline(page, label):
    # Create Destination
    myPage = myPDF.pageAtIndex_(page)
    pageSize = myPage.boundsForBox_(Quartz.kCGPDFMediaBox)
    x = 0
    y = Quartz.CGRectGetMaxY(pageSize)
    pagePoint = Quartz.CGPointMake(x,y)
    myDestination = Quartz.PDFDestination.alloc().initWithPage_atPoint_(myPage, pagePoint)
    myLabel = NSString.stringWithString_(label)
    myOutline = Quartz.PDFOutline.alloc().init()
    myOutline.setLabel_(myLabel)
    myOutline.setDestination_(myDestination)
    return myOutline

pdfURL = NSURL.fileURLWithPath_(infile)
myPDF = Quartz.PDFDocument.alloc().initWithURL_(pdfURL)
if myPDF:
    # Here's where you list your page index (starts at 0) and label.
    outline1 = getOutline(0, 'Page 1')
    outline2 = getOutline(1, 'Page 2')
    outline3 = getOutline(2, 'Page 3')

    # Create a root Outline and add each outline. (Needs a loop.)
    rootOutline = Quartz.PDFOutline.alloc().init()
    rootOutline.insertChild_atIndex_(outline1, 0)
    rootOutline.insertChild_atIndex_(outline2, 1)
    rootOutline.insertChild_atIndex_(outline3, 2)
    myPDF.setOutlineRoot_(rootOutline)
    myPDF.writeToFile_(outfile)

Upvotes: 4

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90253

You can also use pdftk. It is also available for OS X.

I'm not going through all the details here and now, because it's been done elsewhere at great length already. Just briefly:

  1. Create a sample PDF from your original files (without bookmarks).
  2. Add some bookmarks with Adobe Acrobat (which you seem to have access to).
  3. Run one of these commands:

    pdftk my.pdf dump_data output -
    pdftk my.pdf dump_data output bookmarks+otherdata.txt
    
  4. Study the format of the output.

  5. Modify the output .txt file by adding all the entries you want.
  6. Run PDFTK again:

    pdftk my.pdf update_info bookmarks.txt output bookmarked.pdf
    

Additional Information

This is the Bookmark format I noticed after inspecting in Step 4 above.

BookmarkBegin
BookmarkTitle: -- Your Title 1 --
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: -- Your Title 2 --
BookmarkLevel: 1
BookmarkPageNumber: 2
BookmarkBegin
BookmarkTitle: -- Your Title 3 --
...
...
and so on...

And replace the above.. in the appropriate place.

Upvotes: 30

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90253

Here is another answer. This one uses Ghostscript to process PDF-to-PDF and the pdfmark PostScript operator to insert the bookmarks.

For some introduction to the pdfmark topic, see also:

This method involves two steps:

  1. Create a text file (a PostScript file, really), with a limited set of pdfmark commands, one per line and bookmark you want to add.
  2. Run Ghostscript command that processes your current PDF file alongside the text file.

1.

The content on the text file should look something like this:

[/Page 1   /View [/XYZ null null null] /Title (This is page 1)         /OUT pdfmark
[/Page 2   /View [/XYZ null null null] /Title (Dunno which page this is....) /OUT pdfmark
[/Page 3   /View [/XYZ null null null] /Title (Some other name)        /OUT pdfmark
[/Page 4   /View [/XYZ null null null] /Title (File 4)                 /OUT pdfmark
[/Page 5   /View [/XYZ null null null] /Title (File 5)                 /OUT pdfmark
[/Page 6   /View [/XYZ null null null] /Title (File 6)                 /OUT pdfmark
[/Page 7   /View [/XYZ null null null] /Title (File 7)                 /OUT pdfmark
% more lines for more pages to bookmark...
[/Page 13  /View [/XYZ null null null] /Title (File 13)                /OUT pdfmark
[/Page 14  /View [/XYZ null null null] /Title (Bookmark for page 14)   /OUT pdfmark
% more lines for more pages to bookmark...

Name this file for example: addmybookmarks.txt

2.

Now run this command:

gs -o bookmarked.pdf   \
   -sDEVICE=pdfwrite   \
    addmybookmarks.txt \
   -f original.pdf

The resulting PDF, bookmarked.pdf now contains the bookmarks. See this screenshot:

Screenshot of bookmarks added with the help of Ghostscript and <code>pdfmark</code>

Upvotes: 12

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90253

Ok, here is a quick'n'dirty way to do three jobs at once:

  1. Merge your 400 single-page PDFs.
  2. Create a document top level ToC (Table of Contents).
  3. Create a PDF bookmark for each page.

It involves using a LaTeX installation.

You start with an empty LaTeX template like the following one:

\documentclass[]{article}
\usepackage{pdfpages}
\usepackage{hyperref}
    \hypersetup{breaklinks=true,
                bookmarks=true,
                pdfauthor={},
                pdftitle={},
                colorlinks=true,
                citecolor=blue,
                urlcolor=blue,
                linkcolor=magenta,
                pdfborder={0 0 0}}
\begin{document}

{
    \hypersetup{linkcolor=black}
    \setcounter{tocdepth}{3}
    % Comment next line in or out if you want a ToC or not:
    \tableofcontents
}

%% Here goes your additional code:
%% 1 line per included PDF!

\end{document}

Now just before the last line of this template, you insert one line per external PDF file you want to include.

  1. In case you want to generate a ToC, it has to be formatted like this:

    \includepdf[pages={<pagenumber>},addtotoc{<pagenumber>,<section>,<level>,\
                       <heading>,<label>}]{pdffilename.pdf}
    
  2. In case you are sure that each and every included PDF is a 1-page document, it simplifies to this:

    \includepdf[addtotoc{<pagenumber>,<section>,<level>,\
                         <heading>,<label>}]]{pdffilename.pdf}
    

Here all of the following five parameters for addtotoc are required, in the order given for the files to appear in the bookmarks and in the ToC. See further below for a specific example:

  • <pagenumber> : Number of the page of inserted document to be linked to. (In your case always "1", because you insert 1-page documents only; you could insert a 5-page document and link to page 3 of the inserted PDF, though).
  • <section> : The LaTeX sectioning name. Could be section, subsection, subsubsection... In your case "section".
  • <level> : The level of the LaTeX section. In your case "1".
  • <heading> : This is a string. Used for the text of the bookmark
  • <label> : This must be unique for each bookmark. Used in the PDF internally to jump to correct page when bookmark is clicked.

To test this quickly, I used Ghostscript to generate 20 1-page PDF documents:

for i in {1..20}; do
   gs -o p${i}.pdf -sDEVICE=pdfwrite               \
      -c "/Helvetica findfont 30 scalefont setfont \
          100 600 moveto                           \
          (Page ${i}) show                         \
          showpage"; 
done

With these test files I could make the lines to insert into the template look like these:

\includepdf[addtotoc={1,section,1,Page 1 (First),p1}]{p1.pdf}
\includepdf[addtotoc={1,section,1,Page 2,p2}]{p2.pdf}
\includepdf[addtotoc={1,section,1,Page 3,p3}]{p3.pdf}
[...]
\includepdf[addtotoc={1,section,1,Page 11 (In the Middle),p11}]{p11.pdf}
[...]
\includepdf[addtotoc={1,section,1,Page 20 (Last),p20}]{p20.pdf}

Save the template with the inserted lines, then run the following command twice:

 pdflatex template.tex
 pdflatex template.tex

The resulting file will have the bookmarks, looking like this in Preview.app:

Screenshot: Preview.app with the bookmarks opened


Note: LaTeX is available for OSX via two methods:


I'll add one or two other methods to insert bookmarks on the command line too, later or in the next few days, if I have more time.

For now this one has to do, because I never showed it here on SO, AFAICR.

But I thought because you gave the background "I'm merging 1-page PDFs, and it is slow; now I want to add bookmarks too...", I could show how to do it with one single method.

HINT : One of the other methods will be to use pdftk which IS available for Mac OS X!

Upvotes: 3

Related Questions