pdfBox extract xfdf from pdf

Question

I have a text PDF file that is annotated (underlined, highlighted, etc). This pdf file does not contain any forms.

I am trying to extract the xfdf as a string using apache pdfBox, so I can persist the string to a database.

This is what I have up until now, but it does not add the annotations correctly. Using ExportXFDF does not work because my pdf does not contain form data, only text. So AcroForm is null.

private String extractXFDF(Path path) throws ProcessingException {
    try (PDDocument pdfDoc = Loader.loadPDF(new RandomAccessReadBufferedFile(path.toFile().getAbsolutePath()));
         FDFDocument fdfDoc = new FDFDocument();
    ) {
        List fdfAnnotations = extractAnnotations(pdfDoc);
        fdfDoc.getCatalog().getFDF().setAnnotations(fdfAnnotations);
            
        var writer = new StringWriter();
        fdfDoc.saveXFDF(writer);
        return writer.toString();
    } catch (IOException e) {
        throw new CustomException(e.getMessage());
    }
}

THe method extractAnnotations() returns a list of FDFAnnotation correctly.

This is the response I get from this extractXFDF() method:

Can anyone help me out what I could change to make it work?

Thanks in advance.

pdfBox extract xfdf from pdf

Answers (0)

Related Questions