Mykhailo Yablon
Mykhailo Yablon

Reputation: 35

How to split pdf file by one page and remove unused objects (optimize)

I need to split large documents (several thousands of pages and 1-2 Gb) using itext 7

I already tried to split pdf using this reference https://itextpdf.com/en/resources/examples/itext-7/splitting-pdf-file and also doing something like this:

try (PdfDocument pdfDoc = new PdfDocument(new PdfReader(outputPdfPath.toString()))) {
        Files.createDirectories(Paths.get(destFolder));

        int numberOfPages = pdfDoc.getNumberOfPages();
        int pageNumber = 0;

        while (pageNumber < numberOfPages) {
            try (PdfDocument document = new PdfDocument(
                    new PdfWriter(destFolder + pages.get(pageNumber++).id + ".pdf"))) {
                pdfDoc.copyPagesTo(pageNumber, pageNumber, document);
            }
        }
        log.info("Provided PDF has been split into multiple.");
    }

Both examples works perfectly fine but created documents are large and with lots of unused fonts, images, objects. How can I remove all this unused objects to make newly created one paged pdfs weigh less.

Upvotes: 0

Views: 1078

Answers (1)

Uladzimir Asipchuk
Uladzimir Asipchuk

Reputation: 2458

The problem with your document is as follows: each page shares a lot of (maybe even all)the fonts/xobjets of the document. While coping pages, iText doesn't know whether the resources are needed on the page or not: it just copies themm and that's why you get so huge resultant pdfs.

The option you are looking for is iText's pdfSweep.

It's general purpose is redaction of some page's content, however besides that pdfSweep also optimizes the pages while redacting.

So how to sovle yout problem?

a) Specify the redaction area as a degenerate rectangle

b) Clean up the pages (of splitted documents or of the original document):

    PdfCleanUpLocation dummyLocation = new PdfCleanUpLocation(1, new Rectangle(0, 0, 0, 0), null);
        PdfDocument pdfDocument = new PdfDocument(new PdfReader(input), new PdfWriter(output));

    PdfCleanUpTool cleaner = (cleanUpLocations == null)
            ? new PdfCleanUpTool(pdfDocument, true)
            : new PdfCleanUpTool(pdfDocument, cleanUpLocations);
    cleaner.cleanUp();

    pdfDocument.close();

I've tried this approach to process the first of your resultant documents (which represents the first page).

The size of the document before pdfSweep processing: 9282 KB.

The size of the document after pdfSweep processing: 549 KB.

Upvotes: 1

Related Questions