Stephen
Stephen

Reputation: 115

Converting multi-page PDFs to several JPGs using ImageMagick and/or GhostScript

I am trying to convert a multi-page PDF file into a bunch of JPEGs, one for each page in the PDF. I have spent hours and hours looking up how to do this, and eventually I discovered that I need Ghostscript installed. So I did that (from this website: http://downloads.ghostscript.com/public/ And I used the most recent link "ghostscript-9.05.tar.gz" from Feb 8, 2012).

However, even with this installed/downloaded, I am still unable to do what I want. Should I have this saved somewhere special, like in the same folder as ImageMagick?

What I have figured out so far is this:

This is followed by a giant blob of error. It begins with:

    Unrecoverable error: rangecheck in.setuserparams
    Operand stack:

Followed by a blurb of unreadable numbers and caps. It ends with:

    While reading gs_lev2.ps:
    %%[ Error: invalidaccess; OffendingCommand: put ]%%

Needless to say, after hours and hours of deliberation, I don't think I am any closer to doing the seemingly simple task of converting this PDF into a JPG.

What I would like are some step by step instructions on how to make this work. Don't leave out anything, no matter how "obvious" it might seem (especially anything involving ghostscript). This has been troubling me and my supervisor for months now.

For further clarification, we are on a Windows XP operating system. The eventual intention is to call these command lines in R, the statistical language, and run it in a script. In addition, I have been able to successfully convert JPGs to PNG format and vice versa, but PDF just is not working.

Help!!!

Upvotes: 5

Views: 11339

Answers (2)

Jason Malcolm
Jason Malcolm

Reputation: 181

There are numerous SaaS services that will do this for you too. HyPDF and Blitline come to mind.

Upvotes: -3

Kurt Pfeifle
Kurt Pfeifle

Reputation: 90315

You don't need ImageMagick for this, Ghostscript can do it all alone. (If you used ImageMagick, it couldn't do that conversion itself, it HAS to use Ghostscript as its 'delegate'.)

Try this for directly using Ghostscript:

 c:\path\to\gswin32c.exe ^
   -o page_%03d.jpg ^
   -sDEVICE=jpeg ^
    d:/path/to/input.pdf

This will create a new JPEG for each page, and the filenames will increment as page_001.jpg, page_002.jpg,...

Note, this will also create JPEGs which use all the default settings of the jpeg device (one of the most important ones will be that the resolution will be 72dpi).

If you need higher (or lower resolution) for your images, you can add other options:

 gswin32c.exe ^
   -o page_%03d.jpg ^
   -sDEVICE=jpeg ^
   -r300 ^
   -dJPEGQ=100 ^
    d:/path/to/input.pdf

-r300 sets the resolution to 300dpi and -dJPEGQ=100 sets the highest JPEG quality level (Ghostscript's default is 75).

Also note, please: JPEG is not well suited to represent shapes with sharp edges and high contrast in good quality (such as you typically see in black-on-white text pages with small characters).

The (lossy) JPEG compression method is optimized for continuous-tone pictures + photos, and not for line graphics. Therefore it is sub-optimal for such PostScript or PDF input pages which mainly contain text. Here, the lossy compression of the JPEG format will result in poorer quality output even if the input is excellent. See also the JPEG FAQ for more details on this topic.

You may get better image output by choosing PNG as the output format (PNG uses a lossless compression):

 gswin32c.exe ^
   -o page_%03d.png ^
   -sDEVICE=png16m ^
   -r150 ^
    d:/path/to/input.pdf

The png16m device produces 24bit RGB color. You could swap this for pnggray (for pure grayscale output), png256 (for 8-bit color), png16 (4-bit color), pngmono (black and white only) or pngmonod (alternative black-and-white module).

Upvotes: 17

Related Questions