Reputation: 81
I'm writing a python script to get the dpi of a pdf page. To get the DPI of scanned PDF I am using pdfimages command.
$ pdfimages -list test.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1096 2074 gray 1 8 image yes 9 0 500 500 536K 24%
and I get the DPI in x-ppi and y-ppi fields. I'm using the above command in the program with subprocess module But when I try this with a machine-generated PDF it gives me the below output.
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
Can someone please help how can I get the DPI of a Machine-generated PDF from the ubuntu command line or python.
Upvotes: 0
Views: 2366
Reputation: 16184
PDFs don't have a "DPI", they're mostly about encoding vector images which can be rasterized at arbitrary resolutions. the images you're extracting are also just arbitrary 2d arrays of pixels, what determines their "DPI" is are the PDF commands in the file that specify what size the image is going to be rendered into the page
an image stored in a PDF can be displayed multiple times at different sizes (though mostly it's just once) and hence the same image can appear multiple times in the output of -list
, and the source code does seem to reference the transform matrix, so it's probably doing the right thing
the code also doesn't seem to have any way of not doing this, so I'm not sure what you mean by it failing with a "machine-generated PDF"
Upvotes: 2
Reputation: 1744
The machine generated PDF is probably a vector PDF whereas your scanned PDF is a raster PDF. DPI has no meaning in a vector PDF hence it does not report it.
Upvotes: 0