Reputation: 49743
I'd like to take a PDF file and convert it to images, each PDF page becoming a separate image.
"Convert a .doc or .pdf to an image and display a thumbnail in Ruby?" is a similar post, but it doesn't cover how to make separate images for each page.
Upvotes: 9
Views: 23553
Reputation: 345
(I know this is an ancient question. I found it while searching for a solution. So here's an easy solution for others who find it via seaching.)
For PDF files, I think the best I've found (after a long search!) is the command-line program pdftoppm, part of the excellent poppler package of PDF utilities.
I just generated first-page thumbnails for 20,740 PDF files. I started it and went to bed. Don't know when it finished, but running it in a sub-directory was fairly snappy.
You can easily use in from a Ruby script as a shell command. I used the following to generate 1/3rd size thumbs that fit in a 64 kB BLOB field of MySQL:
# query contains 'INSERT INTO `Library` (' and other column names
# fname contains the name of the file to convert
# sql contains other values to insert into the database
query << '`Thumb`,'
cmd = '/usr/local/bin/pdftoppm -singlefile -jpeg -jpegopt quality=50,optimize=y -r 50 "' + fname + '" /tmp/book-files-to-database'
`#{cmd}`
`chmod 755 /tmp/book-files-to-database.jpg`
sql.chop!
sql << "LOAD_FILE('/private/tmp/book-files-to-database.jpg'), " + "'"
Upvotes: 0
Reputation: 4927
Using RMagick itself, you can create images for different pages:
require 'RMagick'
pdf_file_name = "test.pdf"
im = Magick::Image.read(pdf_file_name)
The code above will give you an array arr[]
, which will have one entry for corresponding pages. Do this if you want to generate a JPEG image of the fifth page:
im[4].write(pdf_file_name + ".jpg")
But this will load the entire PDF, so it can be slow.
Alternatively, if you want to create an image of the fifth page and don't want to load the complete PDF file:
require 'RMagick'
pdf_file_name = "test.pdf[5]"
im = Magick::Image.read(pdf_file_name)
im[0].write(pdf_file_name + ".jpg")
Upvotes: 59
Reputation: 30143
ImageMagick can do that with PDFs. Presumably RMagick can do it too, but I'm not familiar with it.
The code from the post you linked to:
require 'RMagick'
pdf = Magick::ImageList.new("doc.pdf")
pdf
is an ImageList
object, which according to the documentation delegates many of its methods to Array
. You should be able to iterate over pdf
and call write
to write the individual images to files.
Upvotes: 22
Reputation: 106097
Since I can't find a way to deal with PDFs on a per-page basis in RMagick, I'd recommend first splitting the PDF into pages with pdftk's burst
command, then dealing with the individual pages in RMagick. This is probably less performant than an all-in-one solution, but unfortunately no all-in-one solution presents itself.
There's also PDF::Toolkit for Ruby that hooks into pdftk but I've never used it.
Upvotes: 2