Bernd
Bernd

Reputation: 3418

How does PS/PDF store and compress bitmaps?

I am experimenting with a system to scan letters and convert the scanned bitmaps to PDF with the goal to have a high resolution and a small PDF file size.

I am prototyping with scanner, GIMP for bitmap manipulation and ImageMagick for bitmap-to-PDF conversion.

My process looks as follows:

Now in order to make the image even better compressible, I could make the bitmap more compression-friendly. Before experimenting here, I would like to know how PS/PDF stores bitmaps.

Are bitmaps in PS/PDF run-lenght-encoded? Then I woud gain compression by removing single pixles form bitmap rows.

Do you have ideas for further optimizing here?

Do you know references to bitmap storage format in PS/PDF?

Upvotes: 0

Views: 3459

Answers (5)

msr
msr

Reputation: 346

A few companies (Luratech and CamiNova are the only ones I know) make a "Mixed Raster Content" model in PDF. The files are viewable in the standard Adobe Reader but are very, very small -- comparable to DjVu.

"Mixed Raster Content" means they segment the image into a high resolution B&W mask (hard edges, lines, letters) and lower resolution smooth tone image (background pictures). The mask gets stored using a bitonal compression algorithm (probably JBIG2) and the smooth tone image gets compressed using JP2K (probably).

Upvotes: 2

markee174
markee174

Reputation:

The compression method is generally selected by the tool creating the PDF and you may have limited control over that.

If you have Acrobat 9.0 there is a really nice 'hidden' feature which allows you to see the object tree inside a PDF (you are interested in the XObjects under Resources). There is a short blog on using it at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

Upvotes: 0

unwind
unwind

Reputation: 399753

Adobe's PDF reference might be a good place to start. From a very cursory look, it looks like images are stored uncompressed, but that doesn't feel right at all. It can also link to external images, in JPEG for instance.

Upvotes: 0

vartec
vartec

Reputation: 134581

For bitmaps, IIRC, PDF uses deflate. But PDF can also store images with more specific image compression algorithms, such JPEG (lossy), CCITT (lossless), JBIG2 (lossy and lossless) and JPX (of JPEG2000, lossy and lossless).

Upvotes: 0

gromgull
gromgull

Reputation: 1548

PDF supports many types of image compression, see: http://en.wikipedia.org/wiki/Pdf#Raster_images

I think you can specify which one to use with the imagemagick -compress option: http://www.imagemagick.org/script/command-line-options.php#compress

Upvotes: 1

Related Questions