david.perez
david.perez

Reputation: 7032

How to change the filter of an image in a PDF file

I'm building a tool to compress PDF files, and using pdfbox. I have some images with the DCTDecode + FlateDecode filter and I'd like to experiment with the JPXDecode filter to see if it occupies less space.

I've seen some code using iText, but how to do it with pdfbox?. I've found no documentation how to do so.

Upvotes: 2

Views: 1734

Answers (2)

Tilman Hausherr
Tilman Hausherr

Reputation: 18926

This code replaces the image stream without having to alter COSWriter (which sounds scary), however my experience with the PDF I tried was that the encoded image was incorrect, i.e. that there is a bug in the JPEG 2000 encoder, so check your result PDFs.

public class SO57972743
{
    public static void main(String[] args) throws IOException
    {
        System.out.println("supported formats: " + Arrays.toString(ImageIO.getReaderFormatNames()));

        try (PDDocument doc = PDDocument.load(new File("test.pdf")))
        {
            // get 1st level images only here (there may be more in form XObjects!)
            PDResources res = doc.getPage(0).getResources();
            for (COSName name : res.getXObjectNames())
            {
                PDXObject xObject = res.getXObject(name);
                if (xObject instanceof PDImageXObject)
                {
                    replaceImageWithJPX(xObject);
                }
            }
            doc.save("test-result.pdf");
        }
    }

    private static void replaceImageWithJPX(PDXObject xObject) throws IOException
    {
        PDImageXObject img = (PDImageXObject) xObject;
        BufferedImage bim = img.getOpaqueImage(); // the mask (if there) won't be touched
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        boolean written = ImageIO.write(bim, "JPEG2000", baos);
        if (!written)
        {
            System.err.println("write failed");
            return;
        }
        // replace image stream
        try (OutputStream os = img.getCOSObject().createRawOutputStream())
        {
            os.write(baos.toByteArray());
        }
        img.getCOSObject().setItem(COSName.FILTER, COSName.JPX_DECODE); // replace filter
        img.getCOSObject().removeItem(COSName.COLORSPACE); // use the colorspace in the image itself
    }
}

Upvotes: 2

david.perez
david.perez

Reputation: 7032

With pdfbox it is possible to compress all images, by using a custom COSWriter that handles all image streams and recodes them with the JPXDecode filter. pdfbox isn't able to do so, but the JAI library with a plugin can generate a JPEG2000 image. Compression factor is configurable, and high compression ratios can be achieved without losing too much quality.

By using in addition the FlateDecode filter, a little more compression can be obtained with no quality loss.

Upvotes: 1

Related Questions