BreadyX
BreadyX

Reputation: 57

Is there a way discard previous pdfmark metadata?

I was trying to automate adding title, bookmarks and such to some PDFs I need. The way I came up with was to create a simple pdfmark script like this:

% pdfmark.ps
[ /Title (My document)
  /Author(Me)
  /DOCINFO pdfmark

[ /Title (First chapter)
  /Page 1
  /OUT pdfmark

Then generate a new PDF with ghostscript using:

gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf pdfmark.ps

If in.pdf doesn't have any pdfmark data it works fine, however if it does things don't work out nicely: for example title/author aren't modified and bookmarks are appended instead of replaced.

Since I don't want to mess around modifying the PDF's corresponding postscript, I was trying to find if there is some command to add to pdfmark.ps that can delete (or overwrite) previous metadata.

Upvotes: 3

Views: 702

Answers (2)

Ubirajara Cruz
Ubirajara Cruz

Reputation: 1

First, I converted pdf to ps with command:

gswin32c.exe -q -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=file.ps file_with_bookmark_to_remove.pdf

After file converted, I convert again to pdf and add pdfmark_file.ps

gswin32c.exe -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=file_with_pdfmark.pdf file.ps pdfmark_file.ps

First command convert pdf to ps with printer "ps2write". Ps file does not have pdfmark. All pdfmarks are removed.

When you convert again with pdf printer "pdfwrite" the file is created without pdfmarks and a new pdfmarks are added with pdfmark_file.ps

Good Luck! Ubirajara

Upvotes: -1

urznow
urznow

Reputation: 1811

I'll leave PostScript to others and show how to remove a PDF outline using the qpdf package (for qpdf and fix-qdf) and GNU sed.

From the qpdf manual:

In QDF mode, qpdf creates PDF files in what we call QDF form. A PDF file in QDF form, sometimes called a QDF file, is a completely valid PDF file that has %QDF-1.0 as its third line (after the pdf header and binary characters) and has certain other characteristics. The purpose of QDF form is to make it possible to edit PDF files, with some restrictions, in an ordinary text editor.

(For a non-GNU/Linux system adapt the commands below.)

qpdf --qdf --compress-streams=n --decode-level=generalized \
     --object-streams=disable -- in.pdf - |
sed --binary \
    -e '/^[ ][ ]*\/Outlines [0-9][0-9]* [0-9] R/ s/[1-9]/0/g' |
fix-qdf > tmp.qdf
qpdf --coalesce-contents --compression-level=9 \
     --object-streams=generate -- tmp.qdf out.pdf

where:

  • 1st qpdf command converts the PDF file to QDF form for editing
  • sed orphans outlines in the QDF file by rooting them at non-existing obj 0
  • fix-qdf repairs the QDF after editing
  • 2nd qpdf converts and compresses QDF to PDF
  • qpdf input cannot be pipelined, it needs to seek

The sed command changes digits to zeros in the line containing the indented text /Outlines. Note that GNU sed is used for the non-standard --binary option to avoid mishaps on an OS distinguishing between text and binary files. Similarly, to strip annotations replace /Outlines with /Annots in the -e above, or insert it in a second -e option to do both. Another patch utility than sed will do; often just one byte has to be changed.

To quickly strip all non-page data (docinfo, outlines a.o. but not annotations) qpdf's --empty option may be useful:

qpdf --coalesce-contents --compression-level=9 \
     --object-streams=generate \
     --empty --pages in.pdf 1-z -- out.pdf

Upvotes: 0

Related Questions