Jakob
Jakob

Reputation: 268

Storing jpg images into a pdf file in a "lossless" way

Given a directory with several jpg files (photos), I would like to create a single pdf file with one photo per page. However, I would like the photos to be stored in the pdf file unchanged; i.e., I would like to avoid decoding and recoding. So ideally I would like to be able to extract the original jpg files (maybe minus the metadata) from the pdf file, using, e.g., a linux command line too like pdfimages.

My ideas so far:

Upvotes: 8

Views: 3787

Answers (4)

K J
K J

Reputation: 11939

Depending on what you wish to do with the files, on windows, if the images are simpler jpeg/gif/tif/png you can store in a cbz, zip, folder or zipped folder and view with SumatraPDF which has the SaveAs PDF option thus all done with one exe.

enter image description here

It will fail with files that are viewable but not acceptable as PDF inputs such as webp or heic, so check in the viewer what the filename extension is before.

It should in practically all cases be lossless, however you should roundtrip with pdfimage -all to do a file compare between input and output to check there was no need to convert any bytes.

Upvotes: 1

c72578
c72578

Reputation: 55

Another possibility for storing jpg images into a pdf file in a "lossless" way is provided by PoDoFo:

podofoimg2pdf is able to perform lossless conversion from JPEG to PDF by embedding the jpg file into the pdf container.

podofoimg2pdf
Usage: podofoimg2pdf [output.pdf] [-useimgsize] [image1 image2 image3 ...]

Options:
 -useimgsize    Use the imagesize as page size, instead of A4

Upvotes: 3

Geremia
Geremia

Reputation: 5656

img2pdf (PyPI page):

Losslessly convert raster images to PDF without re-encoding PNG, JPEG, and JPEG2000 images. This leads to a lossless conversion of PNG, JPEG and JPEG2000 images with the only added file size coming from the PDF container itself. Other raster graphics formats are losslessly stored using the same encoding that PNG uses. Since PDF does not support images with transparency and since img2pdf aims to never be lossy, input images with an alpha channel are not supported.

(pdfimages -all does the exact opposite.)

Upvotes: 15

gettalong
gettalong

Reputation: 830

You could use the following small script which relies on HexaPDF (note: I'm the author of HexaPDF) to do this.

Note: Make sure you have Ruby 2.4 installed, then run gem install hexapdf to install hexapdf.

Here is the script:

require 'hexapdf'

doc = HexaPDF::Document.new

ARGV.each do |image_file|
  image = doc.images.add(image_file)
  page = doc.pages.add
  iw = image.info.width.to_f
  ih = image.info.height.to_f                                                                                                                             
  pw = page.box(:media).width.to_f
  ph = page.box(:media).height.to_f
  rw, rh = pw / iw, ph / ih
  ratio = [rw, rh].min
  iw, ih = iw * ratio, ih * ratio
  x, y = (pw - iw) / 2, (ph - ih) / 2
  page.canvas.image(image, at: [x, y], width: iw, height: ih)
end

doc.write('images.pdf')

Just supply the images as arguments on the command line, the output file will be named images.pdf. Most of the code deals with centering and scaling the images to nicely fit onto the pages.

Upvotes: 2

Related Questions