dinoooze
dinoooze

Reputation: 5

pdfBox extract xfdf from pdf

I have a text PDF file that is annotated (underlined, highlighted, etc). This pdf file does not contain any forms.

I am trying to extract the xfdf as a string using apache pdfBox, so I can persist the string to a database.

This is what I have up until now, but it does not add the annotations correctly. Using ExportXFDF does not work because my pdf does not contain form data, only text. So AcroForm is null.

private String extractXFDF(Path path) throws ProcessingException {
    try (PDDocument pdfDoc = Loader.loadPDF(new RandomAccessReadBufferedFile(path.toFile().getAbsolutePath()));
         FDFDocument fdfDoc = new FDFDocument();
    ) {
        List<FDFAnnotation> fdfAnnotations = extractAnnotations(pdfDoc);
        fdfDoc.getCatalog().getFDF().setAnnotations(fdfAnnotations);
            
        var writer = new StringWriter();
        fdfDoc.saveXFDF(writer);
        return writer.toString();
    } catch (IOException e) {
        throw new CustomException(e.getMessage());
    }
}

THe method extractAnnotations() returns a list of FDFAnnotation correctly.

This is the response I get from this extractXFDF() method:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
</xfdf>

Can anyone help me out what I could change to make it work?

Thanks in advance.

Upvotes: 0

Views: 64

Answers (0)

Related Questions