Reputation: 39

How can i extract image from button icon in PDF using Apache PDFBox?

I want to get image icon from button in pdf using java netbeans, and put it in some panel. However i hit a brick here. I'm using PDFBox as my PDF exporter, and i can't seem to understand enough. I already succeed reading from the form field, but there is no button extractor as long as i try to find it in PDFBox. How should i made it ? And is it possible using this method, or is there any other way around. Thanks in advance.

Edit : I already found to extractimages using the one that are in example utility using this code :

       File myFile = new File(filename);
        try { 

            //PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
            PDDocument pdDoc = null;
            pdDoc = PDDocument.load( myFile );
            PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
            PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
            // dipakai untuk membaca isi file

            List pages = pdDoc.getDocumentCatalog().getAllPages();
            Iterator iter = pages.iterator();
             while( iter.hasNext() )
             {
                 PDPage page = (PDPage)iter.next();
                 PDResources resources = page.getResources();
                 Map images = resources.getImages();
                 if( images != null )
                 {
                     Iterator imageIter = images.keySet().iterator();
                     while( imageIter.hasNext() )
                     {
                         String key = (String  )imageIter.next();
                         PDXObjectImage image = (PDXObjectImage)images.get(key);
                         BufferedImage imagedisplay= image.getRGBImage();
                         jLabel5.setIcon(new ImageIcon(imagedisplay)); // NOI18N                                 
                     }
                 }
             }


        } catch (Exception e) {
               JOptionPane.showMessageDialog(null, "error " + e.getMessage());


        }

However i still fail reading from the button images. Btw i read the tutorial from this page to add button images to pdf. https://acrobatusers.com/tutorials/how-to-create-a-button-form-field-to-insert-a-pdf-file
2nd Edit : Here i also give you the link to the pdf that has icon in it. PDF Link. Thank you in advance.

Upvotes: 3

Answers (1)

mkl

Reputation: 95983

I assume you mean interactive form buttons when you talk about buttons in PDFs.

In general

There is no explicit icon extractor for buttons in PDFBox. But as buttons (and annotations in general) with custom icons have these icons defined as part of their appearances, one can simply (recursively) traverse the resources of the appearances of the annotations and collect the XObjects with subtype Image:

public void extractAnnotationImages(PDDocument document, String fileNameFormat) throws IOException
{
    List<PDPage> pages = document.getDocumentCatalog().getAllPages();
    if (pages == null)
        return;

    for (int i = 0; i < pages.size(); i++)
    {
        String pageFormat = String.format(fileNameFormat, "-" + i + "%s", "%s");
        extractAnnotationImages(pages.get(i), pageFormat);
    }
}

public void extractAnnotationImages(PDPage page, String pageFormat) throws IOException
{
    List<PDAnnotation> annotations = page.getAnnotations();
    if (annotations == null)
        return;

    for (int i = 0; i < annotations.size(); i++)
    {
        PDAnnotation annotation = annotations.get(i);
        String annotationFormat = annotation.getAnnotationName() != null && annotation.getAnnotationName().length() > 0
                ? String.format(pageFormat, "-" + annotation.getAnnotationName() + "%s", "%s")
                : String.format(pageFormat, "-" + i + "%s", "%s");
        extractAnnotationImages(annotation, annotationFormat);
    }
}

public void extractAnnotationImages(PDAnnotation annotation, String annotationFormat) throws IOException
{
    PDAppearanceDictionary appearance = annotation.getAppearance();
    extractAnnotationImages(appearance.getDownAppearance(), String.format(annotationFormat, "-Down%s", "%s"));
    extractAnnotationImages(appearance.getNormalAppearance(), String.format(annotationFormat, "-Normal%s", "%s"));
    extractAnnotationImages(appearance.getRolloverAppearance(), String.format(annotationFormat, "-Rollover%s", "%s"));
}

public void extractAnnotationImages(Map<String, PDAppearanceStream> stateAppearances, String stateFormat) throws IOException
{
    if (stateAppearances == null)
        return;

    for (Map.Entry<String, PDAppearanceStream> entry: stateAppearances.entrySet())
    {
        String appearanceFormat = String.format(stateFormat, "-" + entry.getKey() + "%s", "%s");
        extractAnnotationImages(entry.getValue(), appearanceFormat);
    }
}

public void extractAnnotationImages(PDAppearanceStream appearance, String appearanceFormat) throws IOException
{
    PDResources resources = appearance.getResources();
    if (resources == null)
        return;
    Map<String, PDXObject> xObjects = resources.getXObjects();
    if (xObjects == null)
        return;

    for (Map.Entry<String, PDXObject> entry : xObjects.entrySet())
    {
        PDXObject xObject = entry.getValue();
        String xObjectFormat = String.format(appearanceFormat, "-" + entry.getKey() + "%s", "%s");
        if (xObject instanceof PDXObjectForm)
            extractAnnotationImages((PDXObjectForm)xObject, xObjectFormat);
        else if (xObject instanceof PDXObjectImage)
            extractAnnotationImages((PDXObjectImage)xObject, xObjectFormat);
    }
}

public void extractAnnotationImages(PDXObjectForm form, String imageFormat) throws IOException
{
    PDResources resources = form.getResources();
    if (resources == null)
        return;
    Map<String, PDXObject> xObjects = resources.getXObjects();
    if (xObjects == null)
        return;

    for (Map.Entry<String, PDXObject> entry : xObjects.entrySet())
    {
        PDXObject xObject = entry.getValue();
        String xObjectFormat = String.format(imageFormat, "-" + entry.getKey() + "%s", "%s");
        if (xObject instanceof PDXObjectForm)
            extractAnnotationImages((PDXObjectForm)xObject, xObjectFormat);
        else if (xObject instanceof PDXObjectImage)
            extractAnnotationImages((PDXObjectImage)xObject, xObjectFormat);
    }
}

public void extractAnnotationImages(PDXObjectImage image, String imageFormat) throws IOException
{
    image.write2OutputStream(new FileOutputStream(String.format(imageFormat, "", image.getSuffix())));
}

(from ExtractAnnotationImageTest.java)

Unfortunately the OP did not provide a sample PDF so I applied the code to this example file

(stored as a resource) like this:

/**
 * Test using <a href="http://examples.itextpdf.com/results/part2/chapter08/buttons.pdf">buttons.pdf</a>
 * created by <a href="http://itextpdf.com/examples/iia.php?id=154">part2.chapter08.Buttons</a>
 * from ITEXT IN ACTION — SECOND EDITION.
 */
@Test
public void testButtonsPdf() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("buttons.pdf"))
    {
        PDDocument document = PDDocument.load(resource);
        extractAnnotationImages(document, new File(RESULT_FOLDER, "buttons%s.%s").toString());;
    }
}

(from ExtractAnnotationImageTest.java)

and got these images:

and

There are two issues here:

We extract all image resources attached to the annotation appearance and do not check whether they actually are used anywhere in the appearance stream. Thus, you might find more icons than expected. In the case above, the first image is not used as individual resource but only as mask for the second one.
We extract only image resources, not inline images, and so may miss some images.

Thus, please check this code with your PDFs. If need be, it can be improved.

The OP's file

The OP meanwhile has provided a sample file imageicon.pdf

Calling the methods above like this

/**
 * Test using <a href="http://www.docdroid.net/TDGVQzg/imageicon.pdf.html">imageicon.pdf</a>
 * created by the OP.
 */
@Test
public void testImageiconPdf() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("imageicon.pdf"))
    {
        PDDocument document = PDDocument.load(resource);
        extractAnnotationImages(document, new File(RESULT_FOLDER, "imageicon%s.%s").toString());;
    }
}

(from ExtractAnnotationImageTest.java)

outputs this image:

Thus, it works just fine!

Starting as stand alone tool

The OP indicated in a comment to be

still confuse using junit testing method, however when i try to call it into my main program, it always return with "stream close" error. I already put the file as the same directory as my jar, also trying to give the path manually, but still the same error.

Thus, I added a main method to the class to allow it to

be started without the JUnit framework and
extract from PDFs anywhere in the local file system given by their file names on the command line.

In code:

public static void main(String[] args) throws IOException
{
    ExtractAnnotationImageTest extractor = new ExtractAnnotationImageTest();

    for (String arg : args)
    {
        try (PDDocument document = PDDocument.load(arg))
        {
            extractor.extractAnnotationImages(document, arg+"%s.%s");;
        }
    }
}

(from ExtractAnnotationImageTest.java)

Upvotes: 5

How can i extract image from button icon in PDF using Apache PDFBox?

Answers (1)

In general

The OP's file

Starting as stand alone tool

Related Questions