Adnan Bin Zahir
Adnan Bin Zahir

Reputation: 111

How to get content of a page of a .docx file using Apache Poi?

I'm trying to read .docx files with styling information using Apache Poi which I have done by looping through each XWPFParagraph and working with all the XWPFRun run inside the paragraphs. Now I want to get contents of each pages. So is there a way to get the contents of each pages or is it possible to know in which page a paragraph is currently in?

This is a function that takes the absolute path of a docx file and returns an array of strings

        FileInputStream fis = new FileInputStream(absolutePath);

        XWPFDocument document = new XWPFDocument(fis);

        List<IBodyElement> bodyElements = document.getBodyElements();

        List<String> textList = new ArrayList<>();

        /*  I want to add some kind of outer loop here for each page
            and at the end of that loop I want to add a "<hr/>" tag in the textList
        */
        for (IBodyElement bodyElement : bodyElements) {                 // Looping through paragraphs
           if (bodyElement.getElementType() == BodyElementType.PARAGRAPH) {
                XWPFParagraph paragraph = (XWPFParagraph) bodyElement;
                
                String textToAdd = parseParagraph(paragraph); //custom funtion to handle paragraphs


                textList.add(textToAdd);

            } 
        }
        document.close();
        return textList.toArray(new String[0]);

As you can see my goal here is to add a <hr/> tag after each page. So, if somehow I can get the page number of a paragraph or loop through pages, I will be able to do that.
Please kindly mention if you know about any other approach that may help.

Upvotes: 1

Views: 1866

Answers (1)

hiren
hiren

Reputation: 1105

To get Page Count from XWPFDocument (for your outer loop), you can do something like this:

XWPFDocument docx = new XWPFDocument(POIXMLDocument.openPackage(YOUR_FILE_PATH));

int numOfPages = docx.getProperties().getExtendedProperties().getUnderlyingProperties().getPages();

For your paragraph text,

for (XWPFParagraph p : document.getParagraphs()) {
    System.out.println(p.getParagraphText()); // YOUR PARAGRAPH TEXT
}

Upvotes: 1

Related Questions