Alessandro Artoni
Alessandro Artoni

Reputation: 103

PDFBox join two pdfside by side optimizing disk space

I am using PDFBox to join two PDFs side by side.
I am using the following code:

PDDocument outDoc = new PDDocument();

int maxPages = targetDoc.getNumberOfPages();
if (sourceDoc.getNumberOfPages() > targetDoc.getNumberOfPages()) {
    maxPages = sourceDoc.getNumberOfPages();
}
PDPage sourceIndexPage;
PDPage targetIndexPage;
PDRectangle pdf1Frame;
PDRectangle pdf2Frame;
PDRectangle outPdfFrame;
COSDictionary dict;
PDPage outPdfPage;
LayerUtility layerUtility;
PDFormXObject sourceFormPDF;
PDFormXObject targetFormPDF;
AffineTransform afLeft;
AffineTransform afRight;

for (int indexPage = 0; indexPage < maxPages; indexPage++) {

    // Create output PDF frame
    try {
        sourceIndexPage = sourceDoc.getPage(indexPage);
    } catch (IndexOutOfBoundsException error) {
        sourceDoc.addPage(new PDPage());
        sourceIndexPage = targetDoc.getPage(indexPage);
    }

    try {
        targetIndexPage = targetDoc.getPage(indexPage);
    } catch (IndexOutOfBoundsException error) {
        targetDoc.addPage(new PDPage());
        targetIndexPage = targetDoc.getPage(indexPage);
    }

    sourceIndexPage.setRotation(0);
    targetIndexPage.setRotation(0);

    pdf1Frame = sourceIndexPage.getCropBox();
    pdf2Frame = targetIndexPage.getCropBox();
    outPdfFrame = new PDRectangle(pdf1Frame.getWidth() + pdf2Frame.getWidth(),
            Math.max(pdf1Frame.getHeight(), pdf2Frame.getHeight()));

    // Create output page with calculated frame and add it to the document
    dict = new COSDictionary();
    dict.setItem(COSName.TYPE, COSName.PAGE);
    dict.setItem(COSName.MEDIA_BOX, outPdfFrame);
    dict.setItem(COSName.CROP_BOX, outPdfFrame);
    dict.setItem(COSName.ART_BOX, outPdfFrame);
    outPdfPage = new PDPage(dict);
    outDoc.addPage(outPdfPage);

    // Source PDF pages has to be imported as form XObjects to be able to insert them at a specific point in the output page
    // pageNumber
    layerUtility = new LayerUtility(outDoc);
    sourceFormPDF = layerUtility.importPageAsForm(sourceDoc, indexPage);
    targetFormPDF = layerUtility.importPageAsForm(targetDoc, indexPage);

    // Add form objects to output page
    afLeft = new AffineTransform();
    layerUtility.appendFormAsLayer(outPdfPage, sourceFormPDF, afLeft, "left " + indexPage);
    afRight = AffineTransform.getTranslateInstance(pdf1Frame.getWidth(), 0.0);
    layerUtility.appendFormAsLayer(outPdfPage, targetFormPDF, afRight, "right" + indexPage);
}

outDoc.save("oudDoc.pdf");

The issue I have is that for some documents, the size of the outDoc is too high. I expected it to be something around dim source document + dim target document, but it is 10x, 20x more in reality.

Looking inside the document's structure, I noticed that I am repeating common resources that in the original PDFs were separated. Is there a way to compress/optimize my code to have less space on disk?

Upvotes: 1

Views: 122

Answers (1)

Alessandro Artoni
Alessandro Artoni

Reputation: 103

We solved the problem by postprocessing the generated pdf with ghostscript

Upvotes: 1

Related Questions