Extract PDF metadata using iText Java library

Question

I am trying to get all the XMP metadata stream of a PDF file using the Java library to manipulate PDF files iText. The code I have written is:

package iTextExamples;

import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;

public class ReadMetadata {

    public static void main(String[] args) throws IOException {
        String src = "C:\Path\PDF123.pdf";
        
        PdfReader reader = new PdfReader(src);
        PdfDocument doc = new PdfDocument(reader);
        
        System.out.println(doc.getXmpMetadata());
        
        reader.close();
    }

}

The result I'm getting is NULL and I don't know why.

rhens · Accepted Answer

Unrelated to the issue of getting null, but doc.getXmpMetadata() returns a byte array. So you will not be able to print its content with

System.out.println(doc.getXmpMetadata());

Instead, you'll have to do something like:

byte[] xmp = doc.getXmpMetadata();
String xmpString = new String(xmp, StandardCharsets.UTF_8);
System.out.println(xmpString);

About the null issue:

I assume you're trying to get the document level XMP metadata. Make sure that your PDF file actually contains an XMP metadata stream. If not, null is expected.

You can verify with a PDF viewer that is able to show XMP, or with a PDF object viewer. The XMP metadata stream sits in the Metadata entry of the documents Catalog dictionary:

Extract PDF metadata using iText Java library

Answers (1)

Related Questions