Reputation: 1286
I have about 50-60 pdf files (images) that are 1.5MB large each. Now I don't want to have such large pdf files in my thesis as that would make downloading, reading and printing a pain in the rear. So I tried using ghostscript to do the following:
gs \
-dNOPAUSE -dBATCH \
-sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS="/screen" \
-sOutputFile=output.pdf \
L_2lambda_max_1wl_E0_1_zg.pdf
However, now my 1.4MB pdf is 1.5MB large.
What did I do wrong? Is there some way I can check the resolution of the pdf file? I just need 300dpi images, so would anyone suggest using convert
to change the resolution or is there someway I could change the image resolution (reduce it) with gs
, since the image is very grainy when I use convert
How I use convert:
convert \
-units PixelsPerInch \
~/Desktop/L_2lambda_max_1wl_E0_1_zg.pdf \
-density 600 \
~/Desktop/output.pdf
http://dl.dropbox.com/u/13223318/L_2lambda_max_1wl_E0_1_zg.pdf
Upvotes: 13
Views: 27924
Reputation: 90335
If you run Ghostscript -dPDFSETTINGS=/screen
this is just a sort of shortcut. In fact you'll get (implicitly) a whole bunch of settings used, which you can query with the following command:
gs \
-dNODISPLAY \
-c ".distillersettings {exch ==only ( ) print ===} forall quit" \
| grep '/screen'
On my Ghostscript (v9.06prerelease) I get the following output (slightly edited to increase readability):
/screen
<< /DoThumbnails false
/MonoImageResolution 300
/ColorImageDownsampleType /Average
/PreserveEPSInfo false
/ColorConversionStrategy /sRGB
/GrayImageDownsampleType /Average
/EmbedAllFonts true
/CannotEmbedFontPolicy /Warning
/PreserveOPIComments false
/GrayImageResolution 72
/GrayACSImageDict <<
/ColorTransform 1
/QFactor 0.76
/Blend 1
/HSamples [2 1 1 2]
/VSamples [2 1 1 2]
>>
/ColorImageResolution 72
/PreserveOverprintSettings false
/CreateJobTicket false
/AutoRotatePages /PageByPage
/MonoImageDownsampleType /Average
/NeverEmbed [/Courier
/Courier-Bold
/Courier-Oblique
/Courier-BoldOblique
/Helvetica
/Helvetica-Bold
/Helvetica-Oblique
/Helvetica-BoldOblique
/Times-Roman
/Times-Bold
/Times-Italic
/Times-BoldItalic
/Symbol
/ZapfDingbats]
/ColorACSImageDict <<
/ColorTransform 1
/QFactor 0.76
/Blend 1
/HSamples [2 1 1 2]
/VSamples [2 1 1 2] >>
/CompatibilityLevel 1.3
/UCRandBGInfo /Remove
>>
I'm wondering if your PDFs are image-heavy, and if this sort of conversion does un-welcome things (f.e. re-sampling images with the 'wrong' parameters) which increase the file size...
If this is the case (image-heavy PDF), tell so, and I'll update this answer with a few suggestions....
I had a look at the sample file provided by DNA. Interesting...
No, it does not contain any image.
Instead, it contains one large stream (compressed using /FlateDecode
) which consists of roughly 700.000+ (!!) operations, mostly single vector operations in PDF language, such as:
m
(moveto),
l
(lineto),
d
(setdash),
w
(setlinewidth),
S
(stroke),
s
(closepath and stroke),
W*
(eoclip),
rg
and RG
(setrgbcolor)
and a few more.
(That PDF code is very inefficiently written AFAICS (but does its job), because it does concatenate many short strokes instead of doing 'long' ones, and nearly each stroke defines the color again (even if it is the same as before), and has all the other overhead (start stroke, end stroke,...).
Ghostscript's -dPDFSETTINGS=/screen
do not have any effect here (there are no images to downsample, for example). The increased file size (+48 kByte to be precise) is probably due to Ghostscript re-organizing some of the internal stroking etc. commands to a different order when it interprets the file.
So there is not much you can do about the PDF file size ...
gs \ -o out72.png \ -sDEVICE=pngalpha \ L_2lambda_max_1wl_E0_1_zg.pdf
(I used the pngalpha
output to get transparent background.) The image dimensions of 'out.png' are 259x213px
, the filesize is now 70 kByte. But I'm sure you'll not like the quality :-)
The output quality is 'bad' because Ghostscript uses a default resolution of 72 dpi.
Since you said you'd like to have 300dpi, the command becomes this:
gs \
-o out300.png \
-sDEVICE=pngalpha \
-r300 \
L_2lambda_max_1wl_E0_1_zg.pdf
The filesize now is 750 kByte, the image dimensions are 1080x889
Pixels.
Since Curiosity is en vogue these days... :-) ...I tried to bring down the file size with the help of Adobe Acrobat X Pro on Mac.
You wanna know the results?
Performing a 'Save as... (PDF with reduced filesize)' -- which for me in the past always yielded very good results! -- created a 1,8++ MByte file (+29%). I guess this definitely puts Ghostscript's performance (file size increase +3%) into a realistic perspective !
Upvotes: 17
Reputation: 90335
DNA decided to go for grayscale PNGs. The way he's creating them is in two steps:
pdfwrite
device and the settings-dColorConversionStrategy=/Gray
and-dProcessColorModel=/DeviceGray
.pngalpha
device at a resolution of 300 dpi (-r300
on the GS commandline).This reduces his initial file size of 1.4 MB to 0.7 MB.
But this workflow has the following disadvantage:
There are 2 alternatives to DNA's workflow:
A one-step conversion of (color) PDF -> (color) PNG, using Ghostscript's pngalpha
device with the original PDF as input (same settings of 300 dpi resolution). This would have this advantage:
A one-step conversion of (color) PDF -> grayscale PNG, using Ghostscript's pnggray
device with the original PDF as input (same settings of 300 dpi resolution), with this mix of advantage/disadvantage :
So you can make up your mind and see the output sizes and quality side-by-side, here is a shell script to demonstrate the differences:
#!/bin/bash # # Copywrite (c) 2012 <[email protected]> # License: Creative Commons (CC BY-SA 3.0) function echo_do() { echo echo "Command: ${*}" echo "--------" echo "${@}" } [ -d out ] || mkdir out echo echo " We assume all PDF pages are 1-page PDFs!" echo " (otherwise we'd have to include something like '%03d'" echo " into the output filenames in order to get paged output)" echo echo ' # Convert Color PDF to Grayscale PDF. # If PDF has transparent background (most do), # this will remain transparent in output.) # ATTENTION: since we don't use a resolution, # pdfwrite will use its default value of '-r720'. # (However, this setting will only affect raster objects...) ' for i in *.pdf do echo_do gs \ -o "out/${i}---pdfwrite-devicegray-gs.pdf" \ -sDEVICE=pdfwrite \ -dColorConversionStrategy=/Gray \ -dProcessColorModel=/DeviceGray \ -dCompatibilityLevel=1.4 \ "${i}" done echo ' # Convert (previously generated) grayscale PDF to PNG using Alpha channel # (Alpha channel can make backgrounds transparent) ' for i in out/*pdfwrite-devicegray*.pdf do echo_do gs \ -o "out/$(basename "${i}")---pngalpha-from-pdfwrite-devicegray-gs.png" \ -sDEVICE=pngalpha \ -r300 \ "${i}" done echo ' # Convert (color) PDF to grayscale PNG using Alpha channel # (Alpha channel can make backgrounds transparent) ' for i in *.pdf do # Following only required for 'pdfwrite' output device, not for 'pngalpha'! # -dProcessColorModel=/DeviceGray echo_do gs \ -o "out/${i}---pngalphagray_gs.png" \ -sDEVICE=pngalpha \ -dColorConversionStrategy=/Gray \ -r300 \ "${i}" done echo ' # Convert (color) PDF to (color) PNG using Alpha channel # (Alpha channel can make backgrounds transparent) ' for i in *.pdf do echo_do gs \ -o "out/${i}---pngalphacolor_gs.png" \ -sDEVICE=pngalpha \ -r300 \ "${i}" done echo ' # Convert (color) PDF to grayscale PNG # (no Alpha channel here, therefor [mostly] white backgrounds) ' for i in *.pdf do echo_do gs \ -o "out/${i}---pnggray_gs.png" \ -sDEVICE=pnggray \ -r300 \ "${i}" done echo " All output to be found in ./out/ ..." echo
Run this script and compare the different outputs side by side.
Yes, the 'direct-grayscale-PNG-from-color-PDF-using-pnggray-device' one may look a bit worse (and it doesn't sport the transparent background) than the other one -- but it is also only 20% of its file size. On the other hand, if you wan to buy a bit more quality by sacrificing a bit of disk space -- you could use -r400
instead of -r300
...
Upvotes: 3