Reputation: 11
I am new to PDFBox and am stuck at finding the height of an image in inches. After a couple of searches, this is the piece of code that I am working with:
PDResources resources = aPdPage.findResources();
graphicsState = new PDGraphicsState(aPdPage.findCropBox());
pageWidth = aPdPage.findCropBox().getWidth() / 72;
pageHeight = aPdPage.findCropBox().getHeight() / 72;
@SuppressWarnings("deprecation")
Map<String, PDXObjectImage> imageObjects = resources.getImages();
if (null == imageObjects || imageObjects.isEmpty())
return;
for (Map.Entry<String, PDXObjectImage> entryxObjects : imageObjects.entrySet()) {
PDXObjectImage image = entryxObjects.getValue();
// System.out.println("bits per component: " + image.getBitsPerComponent());
Matrix ctmNew = graphicsState.getCurrentTransformationMatrix();
float imageXScale = ctmNew.getXScale();
float imageYScale = ctmNew.getYScale();
System.out.println("position = " + ctmNew.getXPosition() + ", " + ctmNew.getYPosition());
// size in pixel
System.out.println("size = " + image.getWidth() + "px, " + image.getHeight() + "px");
// size in page units
System.out.println("size = " + imageXScale + "pu, " + imageYScale + "pu");
// size in inches
imageXScale /= 72;
imageYScale /= 72;
System.out.println("size = " + imageXScale + "in, " + imageYScale + "in");
// size in millimeter
imageXScale *= 25.4;
imageYScale *= 25.4;
System.out.println("size = " + imageXScale + "mm, " + imageYScale + "mm");
System.out.printf("dpi = %.0f dpi (X), %.0f dpi (Y) %n", image.getWidth() * 72 / ctmNew.getXScale(), image.getHeight() * 72 / ctmNew.getYScale());
}
But the value is not coming correctly in inches. The imageXScale value in pu is coming to be 0.1 always.
Any help would be appreciated.
Upvotes: 1
Views: 944
Reputation: 95918
First of all you need to know how bitmap images usually are used in PDFs:
In a PDF a page object has a collection of so called resources, among them bitmap image resources, font resources, ...
You can inspect these resources like you currently do:
PDResources resources = aPdPage.findResources();
@SuppressWarnings("deprecation")
Map<String, PDXObjectImage> imageObjects = resources.getImages();
if (null == imageObjects || imageObjects.isEmpty())
return;
for (Map.Entry<String, PDXObjectImage> entryxObjects : imageObjects.entrySet())
{
PDXObjectImage image = entryxObjects.getValue();
System.out.println("size = " + image.getWidth() + "px, " + image.getHeight() + "px");
}
But this only gives you the pixel dimension of the images as they are available in the page resources.
When such an resource is painted onto the page, the operation doing this actually first scales it down to a 1x1 unit square and paints this scaled down version.
The reason why you on screen and on paper have images of reasonable size, is that the way painting operators work in PDFs is influenced by the so called current graphics state. This graphics state contains information like the current fill color, line widths, etc... In particular it also contains the so called current transformation matrix which defines how everything some operation draws shall be stretched, rotated, skewed, translated, ... transformed.
The usual sequence of operations when drawing a bitmap image looks like this:
x
coordinate by the desired widths and the y
coordinate by the desired height of the image to draw,Thus, to know the dimensions of the image on the page, you have to know the current transformation matrix as it is when the image drawing operation is executed.
Your code, on the other hand, uses the current transformation matrix from a freshly instantiated graphics state with all values at defaults. Thus, your code prints the false information on how the image is scaled on the page.
To get the correct information, you have to parse the sequence of operations executed for creating the document page.
This is exactly what the PDFBox PrintImageLocations example does: It processes the page content stream (which contains all those operations), updating a copy of the values of the current graphics state, and when it sees an operation for drawing a bitmap image, it uses the value of the current transformation matrix at that very moment:
protected void processOperator( PDFOperator operator, List arguments ) throws IOException
{
String operation = operator.getOperation();
if( INVOKE_OPERATOR.equals(operation) )
{
COSName objectName = (COSName)arguments.get( 0 );
Map<String, PDXObject> xobjects = getResources().getXObjects();
PDXObject xobject = (PDXObject)xobjects.get( objectName.getName() );
if( xobject instanceof PDXObjectImage )
{
PDXObjectImage image = (PDXObjectImage)xobject;
PDPage page = getCurrentPage();
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
double pageHeight = page.getMediaBox().getHeight();
System.out.println("*******************************************************************");
System.out.println("Found image [" + objectName.getName() + "]");
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
...
[calculate dimensions and rotation of image on page]
...
Thus, for your task that PDFBox example should be a good starting point.
Upvotes: 2