Ville M
Ville M

Reputation: 2029

How do I crop each page in pdf using PDFBOX in Java?

I want to remove the bottom part of each page in the PDF, but not change page size, what is the recommended way to do this in java in PDFBOX? How to remove the footer from each page in PDF?

Is there possibly a way to use PDRectangle to just delete all text/images within it?

snippet of what I tried, using rectangle with setCropBox seems to lose page size, maybe cropBox is not intended for this?

            PDRectangle rectangle = new PDRectangle();
            rectangle.setUpperRightY(mypage.findCropBox().getUpperRightY());
            rectangle.setLowerLeftY(50);
            rectangle.setUpperRightX(mypage.findCropBox().getUpperRightX());
            rectangle.setLowerLeftX(mypage.findCropBox().getLowerLeftX());                  
            mypage.setCropBox(rectangle);
            croppedDoc.addPage(mypage);
            croppedDoc.save(filename);              
            croppedDoc.close();

Closest example in pdfbox cookbook examples I could find is on how to remove entire page, however this is not what I'm looking for, I'd like to just delete few elements from the page: http://pdfbox.apache.org/userguide/cookbook.html

Upvotes: 3

Views: 7205

Answers (2)

yms
yms

Reputation: 10418

The CropBox is the way to go if you want to remove a portion of a page while keeping a rectangular region visible. If you want the page size to remain the same, you need the MediaBox to remain the same.

From the PDF Spec:

CropBox - rectangle (Optional; inheritable) A rectangle, expressed in default user space units, defining the visible region of default user space. When the page is displayed or printed, its contents are to be clipped (cropped) to this rectangle and then imposed on the output medium in some implementation-defined manner (see Section 10.10.1, “Page Boundaries”). Default value: the value of MediaBox.

MediaBox - rectangle (Required; inheritable) A rectangle (see Section 3.8.4, “Rectangles”), expressed in default user space units, defining the boundaries of the physical medium on which the page is intended to be displayed or printed (see Section 10.10.1, “Page Boundaries”).

A have seen (faulty) applications and libraries that force the CropBox and the MediaBox to be the same, double check that this is not what is happening on your case.

Also take into account that the coordinates origin (0,0) in PDF is the bottom-left corner, some libraries do the translation to top-left for you, some others not, you may also want to double check this on the library you are using.

Upvotes: 3

Ed Staub
Ed Staub

Reputation: 15690

I'm also a newbie, but take a look at this page, in particular, the description of TrimBox. If there's no TrimBox on the page, it defaults to CropBox, which would cause what you're seeing.

In general, don't expect the PDFBox docs to tell you much of anything about PDF itself - to use PDFBox well I think you need to go elsewhere - AFAIK, mostly just to the PDF specification. I haven't even skimmed it yet, though!

Upvotes: 3

Related Questions