Jordan Reiter
Jordan Reiter

Reputation: 20992

Crop PDF & add margins

I have a PDF with a CropBox size of 6" wide x 9" high. I need to add it to a standard letter-sized PDF. If I change the CropBox size, then the cropmarks become visible. So ideally what I'd like to do is crop out just the visible portion of the page, then pad the sides so that the total height and width is letter-sized.

Is this possible using PDFBox or another Java class?

Upvotes: 3

Views: 10709

Answers (3)

Baked Inhalf
Baked Inhalf

Reputation: 3735

Other than adding a rectangle to the PDPage constructor you can do this do set the CropBox to any size:

PDRectangle box = new PDRectangle(pageWidth, pageHeight);
page.setMediaBox(box); // MediaBox > BleedBox > TrimBox/CropBox

Upvotes: 1

Zelle
Zelle

Reputation: 21

I have adopted the answer of John a little bit, maybe this will help someone.

I have changed the loop to create a new rectangle, with the wanted dimensions. Then the rectangle is set to the page and afterwards added to the new document. I used this snippet to crop a black border out of a long scanned document.

Notice that this will change the size of the pages.

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;


import java.io.File;
import java.io.IOException;
import java.util.List;

public class Main {


    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws IOException, COSVisitorException {

        File inputFile = new File("/path/to/your/file");
        File outputFile = new File("/path/to/your/file");

        PDDocument inputDoc = PDDocument.load(inputFile);
        PDDocument outputDoc = new PDDocument();

        List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();

        // Lets paint our pages white !
        for (PDPage page : pages) {
            PDRectangle rectangle=new PDRectangle();
            rectangle.setLowerLeftX(0);
            rectangle.setLowerLeftY(0);
            rectangle.setUpperRightX(500);
            rectangle.setUpperRightY(680);

            page.setMediaBox(rectangle);
            page.setCropBox(rectangle);
            outputDoc.addPage(page);
        }

        // Save to file
//        outputFile.delete();
        outputDoc.save(outputFile);

        // Wait until the end to close all documents, or else you get an error
        inputDoc.close();
        outputDoc.close();
    }
}

Upvotes: 2

John Pink
John Pink

Reputation: 637

Have you found an answer to your problem ? I have been facing the same scenario this week.

I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.

I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).

I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !

Here is my code by the way :

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;

import java.awt.Color;
import java.io.*;
import java.util.List;

// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {

    // Arbitrary values obtained in a very obscure way
    static int PAGE_WIDTH = 615;
    static int PAGE_HEIGHT = 815;

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws IOException, COSVisitorException {

        File inputFile = new File("C:\\input.pdf");
        File outputFile = new File("C:\\output.pdf");

        PDDocument inputDoc = PDDocument.load(inputFile);
        PDDocument outputDoc = new PDDocument();

        List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();

        PDPageContentStream pageCS = null;

        // Lets paint our pages white !
        for (PDPage page : pages) {
            pageCS = new PDPageContentStream(inputDoc, page, true, false);
            pageCS.setNonStrokingColor(Color.white);
            // Top rectangle
            pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
            // Bottom rectangle
            pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
            pageCS.close();
            outputDoc.addPage(page);
        }

        // Save to file
        outputFile.delete();
        outputDoc.save(outputFile);

        // Wait until the end to close all documents, or else you get an error
        inputDoc.close();
        outputDoc.close();
    }
}

Upvotes: 3

Related Questions