user3573411
user3573411

Reputation: 63

Coordinates in a pdf page when using pdfbox

I am adding hidden text in pdf files to make them searchable. For some documents bottom left seems to be the default (0,0) where as for others it is top left. My understanding is that can because of page rotation.

In the code below I am getting/printing page rotation but that is showing up as 0 for different test pdf files I have. Any ideas why some documents would translate (0,0) to bottom left while other would go to top left.

        File file = new File(inputDocumentName);
        PDDocument document = PDDocument.load(file);

        //Retrieving the pages of the document
        PDPage page = document.getPage(0);

        int rotation = page.getRotation();
        System.out.println("Rotation: " + rotation);

        contentStream.moveTo(0, 0);

        //Begin the Content stream
        contentStream.beginText();

        //Setting the font to the Content stream
        contentStream.setFont(PDType1Font.COURIER, 20);

        contentStream.newLineAtOffset(0, 0);

        //Adding text in the form of string
        contentStream.showText(text);

        //Ending the content stream
        contentStream.endText();

        //Closing the content stream
        contentStream.close();

        //Saving the document
        document.save(new File(outputDocumentName));

        //Closing the document
        document.close();

Any ideas on how can I find which corner (0,0) represent in a pdf document. Thanks.

Upvotes: 1

Views: 2808

Answers (1)

mkl
mkl

Reputation: 95918

Each page starts with a coordinate system for which x coordinates increase to the right and y coordinates increase upwards. The coordinates may be arbitrarily large limited only by common numeric data structure range and resolution.

On this large plane certain boxes are defined, see the quote from the PDF specification in this answer. Of special interest here is the crop box which defines the region to which the contents of the page shall be clipped (cropped) when displayed or printed, i.e. it defines the visible page area. It defaults to the media box which is mandatory.

This visible area, for displaying, is rotated by the page Rotate value.

Concerning your question

Any ideas on how can I find which corner (0,0) represent in a pdf document.

therefore, you should first become aware that the origin (0,0) of the user space coordinate system need not be a corner at all, it may be virtually anywhere inside or outside the visible area. Merely to keep the situation simple, often a corner of the crop box or media box is the origin. Furthermore, each page may have its own position of the origin, there is no need to keep it the same across document pages.

Methods that help you determine where and how the visible area of a given page is located in respect to the coordinate system:

  • PDPage.getCropBox returns the coordinates of the corners of the crop box. It does take inheritance and defaulting into account and it also tries to intersect with the media box.
  • PDPage.getRotation returns the page rotation (clockwise, in multiples of 90°).

Thus, take the coordinates returned by the first method and depending on the output of the second method pick the coordinates of your corner of interest.

Upvotes: 2

Related Questions