hnakao11
hnakao11

Reputation: 855

What is the best solution to compress PDF with PDFBox?

I have a PDF file to save, but first I have to compress it with the best possible quality and I must use open source library (like Apache PDFBox®).

So, until now what I do is get all the image type resources, compress them and put them back in the PDF, but the compression ratio is to low. This is just a fragment of the code where I assign the compression parameters:

PDImageXObject imageXObject = (PDImageXObject) pdxObject;

ImageWriter imageWriter = ImageIO
      .getImageWritersByFormatName(FileType.JPEG.name().toLowerCase()).next();

ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(COMPRESSION_FACTOR);

There is some other mechanism to optimize a PDF, so far only compress the images shows a slightly poor result.

Upvotes: 3

Views: 10333

Answers (1)

Joop Eggen
Joop Eggen

Reputation: 109547

On compression. Indeed, images probably are the largest culprits.

Images: The image size, width and height, contribute to the file size too, not only the lossy image quality (your COMPRESSION_FACTOR). In general I would start with compressing a JPEG file outside the PDF. Then you can find the best compression, that still shows and prints (!) adequately. Photos JPEG, vector graphics (like diagrams) can best be done with Encapsulated PostScript.

Repeated images like page logos should not be stored repeatedly. The optimisation here is internet streaming.

Fonts: The default fonts need no space, the full fonts need the most space (for PDFs with forms for instance). Embedded fonts are a third possibility, only loading the symbols one needs.

PDFs own binary data: Text and other parts can be uncompressed, compressed using only 7bits ASCII, and further compressed using all bytes. The ASCII option is a bit outdated.

At the moment I am not using pdfbox, hence I leave that to you.

Upvotes: 1

Related Questions