Converting PDFBox coordinates to pixel coordinates of PDPage::convertToImage

Question

I'm using PDFBox's PDPage::convertToImage to display PDF pages in Java. I'm trying to create click-able areas on the PDF page's image based on COSObjects in the page (namely, AcroForm fields). The problem is the PDF seems to use a completely different coordinate system:

System.out.println(field.getDictionary().getItem(COSName.RECT));

yields

COSArray{[COSFloat{149.04}, COSFloat{678.24}, COSInt{252}, COSFloat{697.68}]}

If I were to estimate the actual dimensions of the field's rectangle on the image, it would be 40,40,50,10 (x,y,width,height). There's no obvious correlation between the two and I can't seem to find any information about this with Google.

How can I determine the pixel position of a PDPage's COSObjects?

fabian · Accepted Answer

The pdf coordinate system is not that different from the coordinate system used in images. The only differences are:

the y-axis points up, not down
the scale is most likely different.

You can convert from pdf coordinates to image coordinates using these formulae:

x_image = x_pdf * width_image / width_page
y_image = (height_pdf - y_pdf) * height_image / height_pdf

To get the page size, simply use the mediabox size of the page that contains the annotation:

PDRectangle pageBounds = page.getMediaBox();

You may have missed the correlation between the array from the pdf and your image coordinate estimates, since a rectangle in pdf is represented as array [x_left, y_bottom, x_right, y_top].

Fortunately PDFBox provides classes that operate on a higher level than the cos structure. Use this to your advantage and use e.g. PDRectangle you get from the PDAnnotation using getRectangle() instead of accessing the COSArray you extract from the field's dictionary.

Converting PDFBox coordinates to pixel coordinates of PDPage::convertToImage

Answers (1)

Related Questions