Reputation: 123
I try to edit images in PDF file using PDFBox library. How I have example working only for jpeg images. fails to decode images with 'png' suffix. Here is code example. So my question: how to do the same for all types of images in PDF documents? Can I still use ImageIO for it or need another approach?
public static void main(String[] args) throws Exception {
PDDocument doc = PDDocument.load("docs/input1.pdf");
// Get all images from first page
Map<String, PDXObjectImage> pageImages = ((PDPage) doc.getDocumentCatalog().getAllPages().get(0)).getResources().getImages();
if (pageImages != null)
// iterate by images
Iterator<String> imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext())
String key =;
PDXObjectImage image = pageImages.get(key); // get page image object
String suffix = image.getSuffix(); // get image suffix
String imageName = key+'.'+suffix; // compose image name
System.out.print("process "+imageName+"... ");
COSStream s = image.getCOSStream(); // get COSStream to manipulate
BufferedImage img =; // get BufferedImage to edit
if(img == null)
System.out.println("Can't decode");
paint(img.createGraphics()); // draw on it
ImageIO.write(img, suffix, new File("out/"+imageName)); // write file to check result...
// encode image back to COSStream
OutputStream out = s.createFilteredStream();
ImageIO.write(img, suffix, out);
}"out/output1.pdf"); // save document
* Draw red rectangular to test
* @param g graphics
public static void paint(Graphics2D g) {
int xpoints[] = {25, 245, 245, 25};
int ypoints[] = {25, 25, 545, 545};
g.fillPolygon(xpoints, ypoints, 4);
Upvotes: 1
Views: 1491
Reputation: 123
It's better to work not with stream of PDXObjectImage but create new instance of PDXObjectImage and replace it in resources collection. It's more generic and universal way. Use getRGBImage() to convert PDXObjectImage to BufferedImage and constructor (PDPixelMap, PDJpeg etc) to convert edited result back to PDXObjectImage. Note you still have problems with JBIG2 and Jpeg2000 images due to bugs. Here is code example I use to find and convert all images in document:
// Recursive resource processor
// Here can be images inside in PDXObjectForm objects
protected static void processResources(PDResources resources, PDDocument doc, String filename) throws IllegalArgumentException, SecurityException, IOException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JBIG2Exception, ColorSpaceException, ICCProfileException
if(resources == null) return;
Map<String, PDXObject> xObjects = resources.getXObjects();
if (xObjects == null) return;
// iterate by images
Iterator<String> imageIter = xObjects.keySet().iterator();
while (imageIter.hasNext())
String key =;
PDXObject o = xObjects.get(key);
if(o instanceof PDXObjectImage)
xObjects.put(key, processImage((PDXObjectImage) o /*, some additional parms... */));
if(o instanceof PDXObjectForm)
processResources(((PDXObjectForm) o).getResources(), doc, filename);
Note resources.setXObjects() call at the end - without it changes you made in collection obtained by resources.getXObjects() will not be written back to document.
Upvotes: 1