Felipe
Felipe

Reputation: 338

Extract PDF metadata using iText Java library

I am trying to get all the XMP metadata stream of a PDF file using the Java library to manipulate PDF files iText. The code I have written is:

package iTextExamples;

import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfReader;

public class ReadMetadata {

    public static void main(String[] args) throws IOException {
        String src = "C:\\Path\\PDF123.pdf";
        
        PdfReader reader = new PdfReader(src);
        PdfDocument doc = new PdfDocument(reader);
        
        System.out.println(doc.getXmpMetadata());
        
        reader.close();
    }

}

The result I'm getting is NULL and I don't know why.

Upvotes: 0

Views: 1856

Answers (1)

rhens
rhens

Reputation: 4871

Unrelated to the issue of getting null, but doc.getXmpMetadata() returns a byte array. So you will not be able to print its content with

System.out.println(doc.getXmpMetadata());

Instead, you'll have to do something like:

byte[] xmp = doc.getXmpMetadata();
String xmpString = new String(xmp, StandardCharsets.UTF_8);
System.out.println(xmpString);

About the null issue:

I assume you're trying to get the document level XMP metadata. Make sure that your PDF file actually contains an XMP metadata stream. If not, null is expected.

You can verify with a PDF viewer that is able to show XMP, or with a PDF object viewer. The XMP metadata stream sits in the Metadata entry of the documents Catalog dictionary:

iText RUPS: metadata entry

Upvotes: 1

Related Questions