Reputation: 123
I am using this command to convert pages from a pdf to jpeg images:
magick convert -density 300 sample.pdf output.jpeg
I see a white background and the content of the PDF appears as a smaller image stuck to the bottom left corner of the white "canvas". Can anyone help with why this might be happening and how to prevent this "shrinking"?
My PDF has 14 pages. Here is the metadata for a few of those pages:
>magick identify sample.pdf
sample.pdf[0] PDF 2286x3600 2286x3600+0+0 16-bit sRGB 6458B 0.016u 0:00.017
sample.pdf[1] PDF 2286x3600 2286x3600+0+0 16-bit sRGB 6018B 0.016u 0:00.020
sample.pdf[2] PDF 2286x3600 2286x3600+0+0 16-bit sRGB 5732B 0.016u 0:00.023
And here are the actual and expected outputs for one of the pages:
expected output:
edit: here is a sample PDF:
https://www.dropbox.com/s/0bzu5brfzbedd7i/sample.pdf?dl=0
Upvotes: 0
Views: 2973
Reputation: 11736
Thanks for the example
> magick identify sample.pdf
> sample.pdf[0] PDF 2286x3600
Apears to be wrong as there is no match
from the PDF contents
/Width 1531
/Im0
/Height 2454
/MediaBox [0 0 1531 2454]
on
Page Size:
/CropBox [0 0 919 1473]
919 pt x 1473 pt
32.42 x 51.96 cm
12.76 x 20.45 inches
Therefore no problems when the images were inserted as @ 120 dpi
We can check the image by copy when zoom to 100% in a viewer and paste into say paint, which agrees the image is 1531 x 2454 pixels.
As a result of comments with @fmw42, it was decided to see if GhostScript (which ImageMagick depends on for PDF handling) was having an affect, and certainly processing that PDF using GS v 9.55 without any special switches gave warnings and produced the output below left So the issue seems to be caused by recent GhostScript method of calling/scaling. since using simple GhostScript based image apps (Irfanview using GS plugin on the left) behave the same whilst other viewers have less of a problem even sister product MuPDF as previewed on the right. So the file Media Box as seen and probably used for scaling by Ghostscript seems to be the culprit, but was processed by two other PDF handlers during generation.
One solution would be to use a simpler method of extracting images as PNG thus look at Xpdf command line tools "pdftopng" which gives a good result but you need to calculate that the optimum resolution in this case is 120 (or 240), Typical windows command line does not need .exe but its best to use that when prefixing with a path for use from another location.
pdftopng.exe -r 120 -f 1 -l 1 sample.pdf
Upvotes: 1
Reputation: 53089
I am not sure why you have that behavior. There is something in the PDF, perhaps a crop box, that Imagemagick/Ghostscript is not picking up. But you can get rid of the excess white using -trim
magick sample.pdf -trim sample_%d.jpg
Upvotes: 2