Reputation: 57
Hi I am trying to find the origin i.e x and y coordinates of a page is there any code examples "Using PDFBOX" and also theory that will help to find the origin of the page in the PDF.
By saying that i mean , we need to find wether the origin is left bottom? right bottom? right top? left top ? or from the middle of the page ?
Upvotes: 1
Views: 501
Reputation: 95918
First of all, I assume we are talking about user space coordinates, not device space coordinates. When rendering a PDF, coordinates eventually are translated to the device space of the rendering target. But device space coordinates are device dependent and, therefore, not really appropriate for generic PDF processing tasks.
The default user space coordinate system is in particular used for positioning annotations and is the initial user space coordinate system when starting to process the instructions of the page content stream.
This coordinate system is specified by the effective crop box of the page (which defaults to its media box):
The user space coordinate system shall be initialised to a default state for each page of a document. The CropBox entry in the page dictionary shall specify the rectangle of user space corresponding to the visible area of the intended output medium (display window or printed page). The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the Rotate entry in the page dictionary).
(ISO 32000-2, section 8.3.2.3 "User space")
Thus, even without considering the page rotation, the origin may be anywhere inside, on the edge, or outside the visible page area, e.g. for the following CropBox values:
[ 0 0 612 792 ]
- origin in the lower left[ 0 -792 612 0 ]
- origin in the upper left[ -306 -396 306 396 ]
- origin in the center of the page[ -1612 1000 -1000 1792 ]
- origin off page to the right and belowIf you also take page rotation into account, the origin rotates with the page:
Key Type Value Rotate integer (Optional; inheritable) The number of degrees by which the page shall be rotated clockwise when displayed or printed. The value shall be a multiple of 90. Default value: 0.
(ISO 32000-2, Table 31 "Entries in a page object")
So e.g. for the crop box [ 0 0 612 792 ]
for the following Rotate values:
0
- origin in the lower left90
- origin in the upper left180
- origin in the upper right270
- origin in the lower rightand for the crop box [ -1612 1000 -1000 1792 ]
:
0
- origin off page to the right and below90
- origin off page to the left and below180
- origin off page to the left and above270
- origin off page to the right and aboveOf course also the directions of the coordinate axis change matching the rotation:
0
- x coordinates increase to the right, y coordinates upwards90
- x coordinates increase downwards, y coordinates to the right180
- x coordinates increase to the left, y coordinates downwards270
- x coordinates increase upwards, y coordinates to the leftWhile processing the instructions of a page content stream, the user space may be transformed along, in particular by the cm instruction:
Operands Operator Description a b c d e f cm Modify the current transformation matrix (CTM) by concatenating the specified matrix (see 8.3.2, "Coordinate spaces"). Although the operands specify a matrix, they shall be written as six separate numbers, not as an array.
(ISO 32000-2, Table 56 "Graphics state operators")
One use case for this is to have the current coordinate system "the right side up" after rotation.
For example for the crop box [ 0 0 612 792 ]
and the page rotation 90
, the coordinate system has its origin in the upper left, x coordinates increase downwards, and y coordinates increase to the right. To straighten this out, you'll often find a cm instruction like this at the start of the page content stream:
0 1 -1 0 612 0 cm
After this instruction the origin on the rotated page in our example is again in the lower left, and x coordinates increase to the right and y coordinates upwards.
Upvotes: 2