Reputation: 31580
I've tried a few tests using Imagick::getImageResolution
on a PDF, and I can't figure out how to get the resolution (and colourspace) of an image embedded in a PDF. I've tried ripping the image out of the PDF, but during that process it seems the DPI is arbitrarily set to 72 not mater what I do.
I saw in 1564529 someone said DPI doesn't matter to a PDF, but that is not true (when an image is embedded in a PDF, several attributes about the image, like resolution, are defined in the PostScript). Is there a way in PHP (possibly with PSLib?) to figure out what the DPI of an embedded image is?
Upvotes: 0
Views: 2263
Reputation: 11727
DPI doesn't matter to a PDF
As KenS point out, the same image can be any number of pixels per inch (There are no inches in a PDF either, we say they start as nominally 72 big printers points to be Assumed as roughly 1" but they can be 2" per a Users PDF units).
The size of A4 or A5 or A3 is applied at printing time and that is when scale is applied such that a dot may be 1/300 or 1/150 etc. of an inch of paper.
One image as here and 2 display sizes. The image is simply 24 x 25 pixels at any scale. and may be compressed to lossless (like gif, png or tiff at a nominal 96 canvas units import/export) or expanded to JPEG at a notional 72 printers points per inch.
Pdf info or similar, will show both current ppi values. One is double the other but both images are the same size object number 4. So 24.5 ppi or 49.5 ppi it does not matter it is 1" and 2" printed at 1:1 on a 600 dpi printer.
pdfimages -list "HelloWorldR&W.pdf"
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 24 25 index 1 1 image no 4 0 49 50 75B 100%
1 1 image 24 25 index 1 1 image no 4 0 25 25 75B 100%
%PDF-1.0
%Åѧ¡
1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj
2 0 obj <</Type/Pages/Count 1/Kids[3 0 R]>> endobj
3 0 obj <</Type/Page/MediaBox[0 0 144 144]/Rotate 0/Resources<</XObject<</Img0 4 0 R>>>>/Contents 5 0 R/Parent 2 0 R>> endobj
4 0 obj <</Type/XObject/Subtype/Image/Height 25/Width 24/BitsPerComponent 1/Length 75/ColorSpace[/Indexed/DeviceRGB 1<FF0000FFFFFF>]>> stream
ÿÿÿÿÿÿÀmß[}ÑoEÑ[EÑqEßE}ÀUÿñÿÁ«Á¬ÛZcýÖÇÈ"}ÿÕïÀMsß`§Ñ]9ÑNÑE·ßLÇÀA[ÿÿÿÿÿÿ
endstream
endobj
5 0 obj <</Length 101>> stream
q
1 0 0 -1 18 54 cm
35 0 0 -36 0 36 cm
/Img0 Do
Q
q
1 0 0 -1 71 144 cm
70 0 0 -72 0 72 cm
/Img0 Do
Q
endstream
endobj
xref
0 6
0000000000 00001 f
0000000015 00000 n
0000000060 00000 n
0000000111 00000 n
0000000237 00000 n
0000000472 00000 n
trailer
<</Size 6/Info<</Producer(JScrip2pdf)>>/Root 1 0 R>>
startxref
623
%%EOF
Upvotes: 0
Reputation: 31139
The 'dpi' of an image in PDF (or PostScript) is more nebulous than you may think. This is because it is possible to render the PDF at different scales, and so the actul dpi will vary.
You are correct that there is information regarding the scale factor of the image mebedded in the document. This is the Current Transformation Matrix, but it is not as simple as a single value, or even a single matrix.
The CTM maps co-ordinates into an idealised 'user space' which is nominally defined in points (72 per inch), but is infinitely subdivisible. When it comes to rendering, the 'user space' has a further transformation applied to scale it properly to the 'device space', the transformation is required because the device probably isn't 72 dpi.
You can find a much fuller explanation of this in the PDF Reference Manual, especially section 4.2.1 in the 1.7 reference.
So it would seem that all you need to do is take the declared /Width and /Height from the image dictionary, and apply the /Matrix to determine how big the image is in user space. Given that user space is effectively 72 dpi, then you would know how many inches the image was scaled to, how many pixels the image contains, and a simple division would give you the answer you want.
Indeed, in may cases this will work. However, one of the problems from your point of view, is that is possible, indeed common, to concatenate matrices to affect the current scaling, so simply looking at the matrix applied to an image won't give you the scale factor applied to that image, because something else may have already scaled the CTM. In addition PDF contains the 'UserUnit' kludge which allows a file to alter the default scaling of user space.
So the only way to work out the 'dpi' of an image is to interpret the page description to the point where the image is rendered, work out the total scaling at that point and from there figure out how much area the image covers. Then given the width and height of the image, work out its dpi.
In passing, here's a conundrum for you; its entirely possible to draw the same image multiple times in PDF, using the same image data. You only have to include the image data once. If I draw an image which is 100 pixels by 100 pixels and I draw it to cover one square inch, the resolution is 100 dpi. Now I draw the same image, but I scale it to cover half an inch. The resolution of the rendered image is now 200 dpi.
So what is the 'dpi of the image' ?
Upvotes: 6