High-res images from PDFS

Question

I'm working on a project in which I need to extract a TIFF per page from multi-page PDFs. The PDFs contain images only and there is one image per page (I believe they were made on some kind of photocopier/scanner, but haven't confirmed this). The TIFFs are then used to create several other derivative versions of the document so the higher the resolution the better.

I've found two recipes, both with helpful aspects, but neither is ideal. Hoping someone can help me tune one of them, or offer a third option.

Recipe 1, pdfimages and ImageMagick:

First do:

$ pdfimages $MY_PDF.pdf foo"

Which results in several .pbm files (named foo-000.pbm, foo-001.pbm), etc.

Then for each *.pbm do:

$ convert $each -resize 3200x3200\> -quality 100 $new_name.tif

Pro: The resultant TIFFs are a healthy 3300+ pixels on the long dimension, (-resize just serves to normalize everything)

Con: The orientation of the pages is lost, and they come out rotated different directions (they follow logical patterns, so probably they are the orientation in which they were fed to the scanner??).

Recipe 2 Imagemagick solo:

convert +adjoin $MY_PDF.pdf pages.tif

This gives me a TIFF per page (pages-0.tif, pages-1.tif, etc.).

Pro: Orientation stays!

Con: The long dimension of the resultant file is < 800 px, which is too small to be useful, and it looks as though there is some compression applied.

How can I ditch the scaling of the image stream in the PDF, but retain the orientation? Is there some more magick in ImageMagick that I'm missing? Something else entirely?

Betagan · Accepted Answer

Sorry for the noise on this old topic, but google took me here as one of the top results and it might take others, so I thought I'd post the solution for the TO's question that I found here: http://robfelty.com/2008/03/11/convert-pdf-to-png-with-imagemagick

In Short: You have to tell ImageMagick at which density it should scan the PDF.

so convert -density 600x600 foo.pdf foo.png will tell ImageMagick to treat the PDF as if it had a 600dpi resolution and thus output much larger PNGs. In my case, the resulting foo.png was sized 5000x6600px. You can optionally add -resize 3000x3000 or whatever size you require and it will be scaled down.

Note that as long as you only have vector images or text in your PDF-files, density might be set as high as needed. If the PDF contains rasterized images, it won't look good if you set it higher than those images' dpi, surprise! :)

Chris

High-res images from PDFS

Answers (2)

Related Questions