Reputation: 3048
I'm trying to detect images in this pdf using PDFBox. The pdf has two blank images, one on the left side (below the text "Put this IN the box") and the other on the right side (below the text "Affix this OUTSIDE the box"). This is the code I'm using to detect the images:
PDPage page = (PDPage) catalog.getAllPages().get(0);
PDStream contents = page.getContents();
PDFStreamParser parser = new PDFStreamParser(contents.getStream());
parser.parse();
List<Object> tokens = parser.getTokens();
PDResources resources = page.getResources();
Map<String, PDXObjectImage> images = resources.getImages();
if(null != images){
Iterator<String> it = images.keySet().iterator();
while(it.hasNext()){
String key = it.next();
System.out.println("Key >>>>>>>>>>>>>> "+key);
}
}
I'm able to detect the second image. However, the first image is not being detected. What is the problem? I'm sure the pdf is proper. I created it multiple times, and still I'm facing the same problem. I created the pdf using Sketch.
Thanks.
Upvotes: 1
Views: 1188
Reputation: 96039
I'm able to detect the second image. However, the first image is not being detected. What is the problem?
Actually the same image resource is used for both on-page images, merely stretched to different dimensions.
If you look at the content stream of your page, you'll see this at the end:
q
720 0 0 970 832 126 cm
/Im1 Do
Q
q
512 0 0 128 144 968 cm
/Im1 Do
Q
The first four lines draw the image resource Im1 at position 832, 126 stretched to 720 x 970, and the last 4 lines draw the same image resource Im1 at position 144, 968 stretched to 512 x 128.
Your approach to merely look at the page resources to find on-page images is inappropriate because
A better solution (only failing for inlined and probably patterned images) is presented in the PDFBox sample PrintImageLocations
the output of which for your file is
*******************************************************************
Found image [Im1]
position = 832.0, 128.0
size = 360px, 462px
size = 720.0, 970.0
size = 10.0in, 13.472222in
size = 254.0mm, 342.19446mm
*******************************************************************
Found image [Im1]
position = 144.0, 128.0
size = 360px, 462px
size = 512.0, 128.0
size = 7.111111in, 1.7777778in
size = 180.62222mm, 45.155556mm
This sample makes use of the PDFBox PDFStreamEngine
to parse the content processed to draw a page.
Upvotes: 1