Peggy
Peggy

Reputation: 15

How can I get footnotes and paragraphs from Apache POI XWPFDocument?

I have to extract all footnotes from a XWPFDocument. I only found an example how it works with HWPFDocument. Any ideas?

FileInputStream fisv2 = new FileInputStream("C:\\abc.doc");
WordExtractor extractor = new WordExtractor(fisv2);
String[] fnts = extractor.getFootnoteText();
for (String s: fnts) {
  System.out.println(s + "-->\n");
}
extractor.close();

Upvotes: 0

Views: 931

Answers (1)

Axel Richter
Axel Richter

Reputation: 61995

XWPFWordExtractor does not provide a method for separate extracting the footnotes as WordExtractor provides.

But the XWPFDocument provides XWPFDocument.getFootnotes which returns a java.util.List<XWPFFootnote>. So one could get the single footnotes from that List then.

Example:

import java.io.FileInputStream;

import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.*;

import java.util.List;
import java.util.ArrayList;

public class WordExtracFootnotes {

 public static void main(String[] args) throws Exception {

  // HWPF - binary *.doc format
  WordExtractor extractor = new WordExtractor(new FileInputStream("WordWithFootnotes.doc"));
  String[] hwpfFootnotes = extractor.getFootnoteText();
  for (String footnote : hwpfFootnotes) {
   System.out.println("[" + footnote + "]");
  }
  extractor.close();

  System.out.println();

  // XWPF - Office Open XML *.docx format
  XWPFDocument document = new XWPFDocument(new FileInputStream("WordWithFootnotes.docx"));

  List<XWPFFootnote> xwpfFootnotes = document.getFootnotes();
  for (XWPFFootnote footnote : xwpfFootnotes) {
   StringBuilder footnoteText = new StringBuilder();
   footnoteText.append("[" + footnote.getId() + ":");
   boolean first = true;
   for (XWPFParagraph paragraph : footnote.getParagraphs()) {
    if (!first) footnoteText.append("\n");
    first = false;
    footnoteText.append(paragraph.getText());
   } 
   footnoteText.append("]");
   System.out.println(footnoteText);
  }
  document.close();
 }
}

The footnotes with id -1 and 0 must be ignored since those are only for internal usage and never are referenced in the document.

Upvotes: 3

Related Questions