Edit header in dotx/docx file

Question

I am currently trying to generate a new docx file from an existing template in dotx format. I want to change the firstname, lastname, etc. in the header but I'm not able to access them for some reason... My approach is the following:

 public void generateDocX(Long id) throws IOException, InvalidFormatException {

    //Get user per id
    EmployeeDTO employeeDTO = employeeService.getEmployee(id);

    //Location where the new docx file will be saved
    FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));

    //Get the template for generating the new docx file
    File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
    OPCPackage pkg = OPCPackage.open(template);
    XWPFDocument document = new XWPFDocument(pkg);

    for (XWPFHeader header : document.getHeaderList()) {
        List paragraphs = header.getParagraphs();
        System.out.println("Total paragraphs in header are: " + paragraphs.size());
        System.out.println("Total elements in the header are: " + header.getBodyElements().size());
        for (XWPFParagraph paragraph : paragraphs) {
            System.out.println("Paragraph text is: " + paragraph.getText());
            List runs = paragraph.getRuns();
            for (XWPFRun run : runs) {
                String runText = run.getText(run.getTextPosition());
                System.out.println("Run text is: " + runText);
            }
        }
    }

    //Write the changes to the new docx file and close the document
    document.write(outputStream);
    document.close();
}

The output in the console is either 1, null or empty string... I've tried several approaches from here, here and here but without any luck...

Here is what's inside the template.dotx

Axel Richter · Accepted Answer

IBody.getParagraphs and IBody.getBodyElements- only get the paragraphs or body elements which are directly in that IBody. But your paragraphs are not directly in there but are in a separate text box or text frame. That's why they cannot be got this way.

Since *.docx is a ZIP archive containijg XML files for document, headers and footers, one could get all text runs of one IBody by creating a XmlCursor which selects all w:r XML elements. For a XWPFHeader this could look like so:

 private List getAllCTRs(XWPFHeader header) {
  CTHdrFtr ctHdrFtr = header._getHdrFtr();
  XmlCursor cursor = ctHdrFtr.newCursor();
  cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
  List ctrInHdrFtr = new ArrayList();
  while (cursor.hasNextSelection()) {
   cursor.toNextSelection();
   XmlObject obj = cursor.getObject();
   ctrInHdrFtr.add(obj);
  }
  return ctrInHdrFtr;
 }

Now we have a list of all XML elements in that header which are text-run-elements in Word.

We could have a more general getAllCTRs which gets all CTR elements from any kind of IBody like so:

 private List getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List ctrInIBody = new ArrayList();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

Now we have a list of all XML elements in that IBody which are text-run-elements in Word.

Having that we can get the text out of them like so:

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

This probably shows the next challenge. Because Word is very messy in creating text-run-elements. For example your placeholder <> can be split into text-runs << + Firstname + >>. The reason migt be different formatting or spell checking or something else. Even this is possible: << + Lastname + >>; << + YearOfBirth + >>. Or even this: < + >> << + Lastname>>; << + YearOfBirth>>. You see, replacing the placeholders with text is nearly impossible because the placeholders may be split into multiple tex-runs.



To avoid this the template.dotx needs to be created from users who know what they are doing.

At first turn spell check off. Grammar check as well. If not, all found possible spell errors or grammar violations are in separate text-runs to mark them accordingly.

Second make sure the whole placeholder is eaqual formatted. Different formatted text also must be in separate text-runs.

I am really skeptic that this will work properly. But try it yourself.

Complete example:

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;

import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;

import java.util.List;
import java.util.ArrayList;

public class WordEditAllIBodys {

 private List getAllCTRs(IBody iBody) {
  XmlCursor cursor = null;
  List ctrInIBody = new ArrayList();

  if (iBody instanceof XWPFHeaderFooter) {
   XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
   CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
   cursor = ctHdrFtr.newCursor();
  } else if (iBody instanceof XWPFDocument) {
   XWPFDocument document = (XWPFDocument)iBody;
   CTDocument1 ctDocument1 = document.getDocument();
   cursor = ctDocument1.newCursor();
  } else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
   XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
   CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
   cursor = ctFtnEdn.newCursor();
  }

  if (cursor != null) {
   cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
   while (cursor.hasNextSelection()) {
    cursor.toNextSelection();
    XmlObject obj = cursor.getObject();
    ctrInIBody.add(obj);
   }
  }
  return ctrInIBody ;
 }

 private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
  List ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    System.out.println(text);
   }
  }
 }

 private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
  List ctrInIBody = getAllCTRs(iBody);
  for (XmlObject obj : ctrInIBody) {
   CTR ctr = CTR.Factory.parse(obj.xmlText());
   for (CTText ctText : ctr.getTList()) {
    String text = ctText.getStringValue();
    if (text != null && text.contains(placeHolder)) {
     text = text.replace(placeHolder, textValue);
     ctText.setStringValue(text);
     obj.set(ctr);
    }
   }
  }
 }

 public void generateDocX() throws Exception {

  FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));

  //Get the template for generating the new docx file
  File template = new File("./template.dotx");
  XWPFDocument document = new XWPFDocument(new FileInputStream(template));

  //traverse all headers
  for (XWPFHeader header : document.getHeaderList()) {
   printAllTextInTextRunsOfIBody(header);

   replaceTextInTextRunsOfIBody(header, "<>", "Axel");
   replaceTextInTextRunsOfIBody(header, "<>", "Richter");
   replaceTextInTextRunsOfIBody(header, "<>", "Skeptic");
  }  

  //traverse all footers
  for (XWPFFooter footer : document.getFooterList()) {
   printAllTextInTextRunsOfIBody(footer);

   replaceTextInTextRunsOfIBody(footer, "<>", "Axel");
   replaceTextInTextRunsOfIBody(footer, "<>", "Richter");
   replaceTextInTextRunsOfIBody(footer, "<>", "Skeptic");
  }  

  //traverse document body; note: tables needs not be traversed separately because they are in document body
  printAllTextInTextRunsOfIBody(document);

  replaceTextInTextRunsOfIBody(document, "<>", "Axel");
  replaceTextInTextRunsOfIBody(document, "<>", "Richter");
  replaceTextInTextRunsOfIBody(document, "<>", "Skeptic");

  //traverse all footnotes
  for (XWPFFootnote footnote : document.getFootnotes()) {
   printAllTextInTextRunsOfIBody(footnote);

   replaceTextInTextRunsOfIBody(footnote, "<>", "Axel");
   replaceTextInTextRunsOfIBody(footnote, "<>", "Richter");
   replaceTextInTextRunsOfIBody(footnote, "<>", "Skeptic");
  }  

  //traverse all endnotes
  for (XWPFEndnote endnote : document.getEndnotes()) {
   printAllTextInTextRunsOfIBody(endnote);

   replaceTextInTextRunsOfIBody(endnote, "<>", "Axel");
   replaceTextInTextRunsOfIBody(endnote, "<>", "Richter");
   replaceTextInTextRunsOfIBody(endnote, "<>", "Skeptic");
  }  


  //since document was opened from *.dotx the content type needs to be changed
  document.getPackage().replaceContentType(
   "application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
   "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");

  //Write the changes to the new docx file and close the document
  document.write(outputStream);
  outputStream.close();
  document.close();
 }

 public static void main(String[] args) throws Exception {
  WordEditAllIBodys app = new WordEditAllIBodys();
  app.generateDocX();
 }
}


Btw.: Since your document was opened from *.dotx the content type needs to be changed from wordprocessingml.template to wordprocessingml.document. Else Word will not open the resulting *.docx document. See Converting a file with ".dotx" extension (template) to "docx" (Word File).

As I am skeptic about the replacing-placeholder-text-approach, my preferred way is filling forms. See Problem with processing word document java. Of course such form fields cannot be used in header or footer. So headers or footers schould be created form scratch at whole.

Edit header in dotx/docx file

Answers (1)

Related Questions