Reputation: 232
I'm trying to write a small program for Linux to resize PDFs and adjust margins. My plan was to use Ghostscript as a back-end. This Terminal command successfully resizes most PDFs:
gs -q -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dFIXEDMEDIA -dPDFFitPage \
-dDEVICEWIDTHPOINTS=300 -dDEVICEHEIGHTPOINTS=400 -sOutputFile=out.pdf file.pdf
The -dPDFFitPage
option scales pages to fit the new size, adding whitespace as padding if the image aspect ratio doesn't match the specified dimensions. Removing -dPDFFitPage
changes the page size without scaling - pages will be cropped if too large, or whitespace added if too small.
However, the command doesn't work with PDFs created by ImageMagick's "convert" program. The PDF is scaled but no whitespace is added so only one dimension will be correct in the output file. Without the -dPDFFitPage
option oversize images are cropped as expected, but nothing appears to happen if the image is smaller than the new page size (i.e. no whitespace is added).
It appears that the problem lies with the fact that the PDF is empty apart from the image. How can I get Ghostscript to adjust the page size and fill the empty part of the page with white if necessary?
Edit: Example files
To see the problem, try with these example files (there are also example Ghostscript output PDFs).
Alternatively, use ImageMagick (or any image editor) to create a suitable example image yourself:
convert -size 500x500 xc:skyblue -fill black -draw "circle 250,250 0,250" image.png
Now, use ImageMagick (NOT any other program) to convert it to a PDF:
convert image.png file.pdf
Now try this with the Ghostscript code. See what happens when you try it:
-dPDFFitPage
To see how it is supposed to work, try using any other tool to convert the example image to a PDF. You could (for example) use LibreOffice or LaTeX, or take the PDF you just made (the one that didn't work) and "Print" it to create another PDF (which for some reason will work). Make sure the image fills the entire page of the PDF (there should be no whitespace/border in the PDF you use to test with Ghostscript, but the output PDF created by Ghostscript should have some whitespace.)
Upvotes: 2
Views: 1179
Reputation: 31139
Your original PDF file (NotWorking.pdf
) contains a /CropBox
in addition to a /MediaBox
. This is carried through to the output PDF file, and due to the way that -dPDFFitPage
works, it is appropriately modified in the same way as the actual content of the PDF file. The result is that the scaled file looks the same as the original.
It isn't the same, the original file has a /MediaBox
of [0 0 500 500]
and the modified file has a /MediaBox
of [0 0 300 400]
. But the effect is that it looks the same, in a reader which enforces the /CropBox
.
Try running the two 'modified' files back through Ghostscript and see what happens. Ghostscript honours the /MediaBox
, not the /CropBox
, by default.
Once you've tried running the two output files through Ghostscript, try it with -dUseCropBox
.
You'll need to...
/CropBox
, /MediaBox
, if need to do a pdfmark
operation. You might like to refer to this answer for some more pointers.
Upvotes: 1
Reputation: 90243
Just an additional pointer...
/CropBox
definitions in an input PDF file, the method to provide one via a -c "[...pdfmark"
parameter for Ghostscript will not work! In these cases it often helps to first "disarm" the existing /CropBox
keyword inside the PDF file by changing it to lower case: make it read /cropBox
(since PDF keywords are case sensitive, it will no longer be recognized/used).
You can do this with any method at your disposal: text editor (use one that doesn't change your EOL characters behind your back!), or sed
, or...
To check, if there are *Boxes defined which are different from the default /MediaBox
(which MUST be there in each PDF file), you can use pdfinfo -box
. This command will always report values not only for /MediaBox
, but also for /CropBox
, /BleedBox
, /ArtBox
and /TrimBox
. In cases were /CropBox
, /BleedBox
, /ArtBox
and /TrimBox
are not explicitly defined in the PDF document, the tool will report identical values as are set for /MediaBox
:
$ pdfinfo -box "out(NotWorking).pdf"
Title: NotWorking
Producer: GPL Ghostscript 9.15
CreationDate: Sun May 24 00:38:55 2015
ModDate: Sun May 24 00:38:55 2015
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 300 x 300 pts
Page rot: 0
MediaBox: 0.00 0.00 300.00 400.00
CropBox: 0.00 50.00 300.00 350.00
BleedBox: 0.00 50.00 300.00 350.00
TrimBox: 0.00 50.00 300.00 350.00
ArtBox: 0.00 50.00 300.00 350.00
File size: 16316 bytes
Optimized: no
PDF version: 1.5
However, this does not help in cases where the /CropBox
definition is explicitly there, but set to identical values as /MediaBox
is set:
$ pdfinfo -box NotWorking.pdf
Title: NotWorking
Producer: ImageMagick 6.8.9-9 Q16 x86_64 2015-01-06 http://www.imagemagick.org
CreationDate: Sun May 24 00:21:28 2015
ModDate: Sun May 24 00:21:28 2015
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 500 x 500 pts
Page rot: 0
MediaBox: 0.00 0.00 500.00 500.00
CropBox: 0.00 0.00 500.00 500.00
BleedBox: 0.00 0.00 500.00 500.00
TrimBox: 0.00 0.00 500.00 500.00
ArtBox: 0.00 0.00 500.00 500.00
File size: 12343 bytes
Optimized: no
PDF version: 1.4
In these cases you must look into the PDF source code, or run:
for i in *.pdf ; do
echo $i;
echo -n " ";
grep -a -o --color -P "/.*?Box.*?]" "$i" ;
echo ;
done
NotWorking.pdf
/MediaBox [0 0 500 500]
/CropBox [0 0 500 500]
Working.pdf
/MediaBox [ 0 0 500 500 ]
out(NotWorking).pdf
/Type/Page/MediaBox [0 0 300 400]
/CropBox [0 50.0 300.0 350.0]
out(Working).pdf
/Type/Page/MediaBox [0 0 300 400]
As you can see, the file NotWorking.pdf
did have its own explicit /CropBox
value pre-set already...
One more caveat, be warned:
My
grep
command given above will not discover the/CropBox
setting in cases where the respective PDF object is obscured by being embedded into an/ObjStm
object ("object stream").
Upvotes: 0