HullCityFan852
HullCityFan852

Reputation: 232

Ghostscript resize PDF not working with ImageMagick PDF

I'm trying to write a small program for Linux to resize PDFs and adjust margins. My plan was to use Ghostscript as a back-end. This Terminal command successfully resizes most PDFs:

gs -q -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dFIXEDMEDIA -dPDFFitPage \
 -dDEVICEWIDTHPOINTS=300 -dDEVICEHEIGHTPOINTS=400 -sOutputFile=out.pdf file.pdf

The -dPDFFitPage option scales pages to fit the new size, adding whitespace as padding if the image aspect ratio doesn't match the specified dimensions. Removing -dPDFFitPage changes the page size without scaling - pages will be cropped if too large, or whitespace added if too small.

However, the command doesn't work with PDFs created by ImageMagick's "convert" program. The PDF is scaled but no whitespace is added so only one dimension will be correct in the output file. Without the -dPDFFitPage option oversize images are cropped as expected, but nothing appears to happen if the image is smaller than the new page size (i.e. no whitespace is added).

It appears that the problem lies with the fact that the PDF is empty apart from the image. How can I get Ghostscript to adjust the page size and fill the empty part of the page with white if necessary?

Edit: Example files

To see the problem, try with these example files (there are also example Ghostscript output PDFs).

Alternatively, use ImageMagick (or any image editor) to create a suitable example image yourself:

convert -size 500x500 xc:skyblue -fill black -draw "circle 250,250 0,250" image.png

Now, use ImageMagick (NOT any other program) to convert it to a PDF:

convert image.png file.pdf

Now try this with the Ghostscript code. See what happens when you try it:

To see how it is supposed to work, try using any other tool to convert the example image to a PDF. You could (for example) use LibreOffice or LaTeX, or take the PDF you just made (the one that didn't work) and "Print" it to create another PDF (which for some reason will work). Make sure the image fills the entire page of the PDF (there should be no whitespace/border in the PDF you use to test with Ghostscript, but the output PDF created by Ghostscript should have some whitespace.)

Upvotes: 2

Views: 1179

Answers (2)

KenS
KenS

Reputation: 31139

Your original PDF file (NotWorking.pdf) contains a /CropBox in addition to a /MediaBox. This is carried through to the output PDF file, and due to the way that -dPDFFitPage works, it is appropriately modified in the same way as the actual content of the PDF file. The result is that the scaled file looks the same as the original.

It isn't the same, the original file has a /MediaBox of [0 0 500 500] and the modified file has a /MediaBox of [0 0 300 400]. But the effect is that it looks the same, in a reader which enforces the /CropBox.

Try running the two 'modified' files back through Ghostscript and see what happens. Ghostscript honours the /MediaBox, not the /CropBox, by default.

Once you've tried running the two output files through Ghostscript, try it with -dUseCropBox.

You'll need to...

  • ...either disable the /CropBox,
  • ...or set it to be the same as the /MediaBox,

if need to do a pdfmark operation. You might like to refer to this answer for some more pointers.

Upvotes: 1

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90243

Just an additional pointer...

  • In cases where there are already /CropBox definitions in an input PDF file, the method to provide one via a -c "[...pdfmark" parameter for Ghostscript will not work!

In these cases it often helps to first "disarm" the existing /CropBox keyword inside the PDF file by changing it to lower case: make it read /cropBox (since PDF keywords are case sensitive, it will no longer be recognized/used).

You can do this with any method at your disposal: text editor (use one that doesn't change your EOL characters behind your back!), or sed, or...

To check, if there are *Boxes defined which are different from the default /MediaBox (which MUST be there in each PDF file), you can use pdfinfo -box. This command will always report values not only for /MediaBox, but also for /CropBox, /BleedBox, /ArtBox and /TrimBox. In cases were /CropBox, /BleedBox, /ArtBox and /TrimBox are not explicitly defined in the PDF document, the tool will report identical values as are set for /MediaBox:

$ pdfinfo -box "out(NotWorking).pdf"

 Title:          NotWorking
 Producer:       GPL Ghostscript 9.15
 CreationDate:   Sun May 24 00:38:55 2015
 ModDate:        Sun May 24 00:38:55 2015
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           none
 JavaScript:     no
 Pages:          1
 Encrypted:      no
 Page size:      300 x 300 pts
 Page rot:       0
 MediaBox:           0.00     0.00   300.00   400.00
 CropBox:            0.00    50.00   300.00   350.00
 BleedBox:           0.00    50.00   300.00   350.00
 TrimBox:            0.00    50.00   300.00   350.00
 ArtBox:             0.00    50.00   300.00   350.00
 File size:      16316 bytes
 Optimized:      no
 PDF version:    1.5

However, this does not help in cases where the /CropBox definition is explicitly there, but set to identical values as /MediaBox is set:

$ pdfinfo -box NotWorking.pdf

 Title:          NotWorking
 Producer:       ImageMagick 6.8.9-9 Q16 x86_64 2015-01-06 http://www.imagemagick.org
 CreationDate:   Sun May 24 00:21:28 2015
 ModDate:        Sun May 24 00:21:28 2015
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           none
 JavaScript:     no
 Pages:          1
 Encrypted:      no
 Page size:      500 x 500 pts
 Page rot:       0
 MediaBox:           0.00     0.00   500.00   500.00
 CropBox:            0.00     0.00   500.00   500.00
 BleedBox:           0.00     0.00   500.00   500.00
 TrimBox:            0.00     0.00   500.00   500.00
 ArtBox:             0.00     0.00   500.00   500.00
 File size:      12343 bytes
 Optimized:      no
 PDF version:    1.4

In these cases you must look into the PDF source code, or run:

for i in *.pdf ; do
   echo $i;
   echo -n "  ";
   grep -a -o --color -P "/.*?Box.*?]" "$i" ;
   echo ;
done

NotWorking.pdf
  /MediaBox [0 0 500 500]
  /CropBox [0 0 500 500]

Working.pdf
  /MediaBox [ 0 0 500 500 ]

out(NotWorking).pdf
  /Type/Page/MediaBox [0 0 300 400]
  /CropBox [0 50.0 300.0 350.0]

out(Working).pdf
  /Type/Page/MediaBox [0 0 300 400]

As you can see, the file NotWorking.pdf did have its own explicit /CropBox value pre-set already...

One more caveat, be warned:

My grep command given above will not discover the /CropBox setting in cases where the respective PDF object is obscured by being embedded into an /ObjStm object ("object stream").

Upvotes: 0

Related Questions