Reputation: 154
I am currently trying to generate a new docx file from an existing template in dotx format. I want to change the firstname, lastname, etc. in the header but I'm not able to access them for some reason... My approach is the following:
public void generateDocX(Long id) throws IOException, InvalidFormatException {
//Get user per id
EmployeeDTO employeeDTO = employeeService.getEmployee(id);
//Location where the new docx file will be saved
FileOutputStream outputStream = new FileOutputStream(new File("/home/user/Documents/project/src/main/files/" + employeeDTO.getId() + "header.docx"));
//Get the template for generating the new docx file
File template = new File("/home/user/Documents/project/src/main/files/template.dotx");
OPCPackage pkg = OPCPackage.open(template);
XWPFDocument document = new XWPFDocument(pkg);
for (XWPFHeader header : document.getHeaderList()) {
List<XWPFParagraph> paragraphs = header.getParagraphs();
System.out.println("Total paragraphs in header are: " + paragraphs.size());
System.out.println("Total elements in the header are: " + header.getBodyElements().size());
for (XWPFParagraph paragraph : paragraphs) {
System.out.println("Paragraph text is: " + paragraph.getText());
List<XWPFRun> runs = paragraph.getRuns();
for (XWPFRun run : runs) {
String runText = run.getText(run.getTextPosition());
System.out.println("Run text is: " + runText);
}
}
}
//Write the changes to the new docx file and close the document
document.write(outputStream);
document.close();
}
The output in the console is either 1, null or empty string... I've tried several approaches from here, here and here but without any luck...
Here is what's inside the template.dotx
Upvotes: 1
Views: 911
Reputation: 61870
IBody.getParagraphs and IBody.getBodyElements- only get the paragraphs or body elements which are directly in that IBody
. But your paragraphs are not directly in there but are in a separate text box or text frame. That's why they cannot be got this way.
Since *.docx
is a ZIP
archive containijg XML
files for document, headers and footers, one could get all text runs of one IBody
by creating a XmlCursor
which selects all w:r
XML
elements. For a XWPFHeader
this could look like so:
private List<XmlObject> getAllCTRs(XWPFHeader header) {
CTHdrFtr ctHdrFtr = header._getHdrFtr();
XmlCursor cursor = ctHdrFtr.newCursor();
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
List<XmlObject> ctrInHdrFtr = new ArrayList<XmlObject>();
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInHdrFtr.add(obj);
}
return ctrInHdrFtr;
}
Now we have a list of all XML
elements in that header which are text-run-elements in Word
.
We could have a more general getAllCTRs
which gets all CTR
elements from any kind of IBody
like so:
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
Now we have a list of all XML
elements in that IBody
which are text-run-elements in Word
.
Having that we can get the text out of them like so:
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
This probably shows the next challenge. Because Word
is very messy in creating text-run-elements. For example your placeholder <<Firstname>>
can be split into text-runs <<
+ Firstname
+ >>
. The reason migt be different formatting or spell checking or something else. Even this is possible: <<
+ Lastname
+ >>; <<
+ YearOfBirth
+ >>
. Or even this: <<Firstname
+ >> <<
+ Lastname>>; <<
+ YearOfBirth>>
. You see, replacing the placeholders with text is nearly impossible because the placeholders may be split into multiple tex-runs.
To avoid this the template.dotx
needs to be created from users who know what they are doing.
At first turn spell check off. Grammar check as well. If not, all found possible spell errors or grammar violations are in separate text-runs to mark them accordingly.
Second make sure the whole placeholder is eaqual formatted. Different formatted text also must be in separate text-runs.
I am really skeptic that this will work properly. But try it yourself.
Complete example:
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.List;
import java.util.ArrayList;
public class WordEditAllIBodys {
private List<XmlObject> getAllCTRs(IBody iBody) {
XmlCursor cursor = null;
List<XmlObject> ctrInIBody = new ArrayList<XmlObject>();
if (iBody instanceof XWPFHeaderFooter) {
XWPFHeaderFooter headerFooter = (XWPFHeaderFooter)iBody;
CTHdrFtr ctHdrFtr = headerFooter._getHdrFtr();
cursor = ctHdrFtr.newCursor();
} else if (iBody instanceof XWPFDocument) {
XWPFDocument document = (XWPFDocument)iBody;
CTDocument1 ctDocument1 = document.getDocument();
cursor = ctDocument1.newCursor();
} else if (iBody instanceof XWPFAbstractFootnoteEndnote) {
XWPFAbstractFootnoteEndnote footEndnote = (XWPFAbstractFootnoteEndnote)iBody;
CTFtnEdn ctFtnEdn = footEndnote.getCTFtnEdn();
cursor = ctFtnEdn.newCursor();
}
if (cursor != null) {
cursor.selectPath("declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//*/w:r");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
ctrInIBody.add(obj);
}
}
return ctrInIBody ;
}
private void printAllTextInTextRunsOfIBody(IBody iBody) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
System.out.println(text);
}
}
}
private void replaceTextInTextRunsOfIBody(IBody iBody, String placeHolder, String textValue) throws Exception {
List<XmlObject> ctrInIBody = getAllCTRs(iBody);
for (XmlObject obj : ctrInIBody) {
CTR ctr = CTR.Factory.parse(obj.xmlText());
for (CTText ctText : ctr.getTList()) {
String text = ctText.getStringValue();
if (text != null && text.contains(placeHolder)) {
text = text.replace(placeHolder, textValue);
ctText.setStringValue(text);
obj.set(ctr);
}
}
}
}
public void generateDocX() throws Exception {
FileOutputStream outputStream = new FileOutputStream(new File("./" + 1234 + "header.docx"));
//Get the template for generating the new docx file
File template = new File("./template.dotx");
XWPFDocument document = new XWPFDocument(new FileInputStream(template));
//traverse all headers
for (XWPFHeader header : document.getHeaderList()) {
printAllTextInTextRunsOfIBody(header);
replaceTextInTextRunsOfIBody(header, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(header, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(header, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all footers
for (XWPFFooter footer : document.getFooterList()) {
printAllTextInTextRunsOfIBody(footer);
replaceTextInTextRunsOfIBody(footer, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footer, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footer, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse document body; note: tables needs not be traversed separately because they are in document body
printAllTextInTextRunsOfIBody(document);
replaceTextInTextRunsOfIBody(document, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(document, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(document, "<<ProfessionalTitle>>", "Skeptic");
//traverse all footnotes
for (XWPFFootnote footnote : document.getFootnotes()) {
printAllTextInTextRunsOfIBody(footnote);
replaceTextInTextRunsOfIBody(footnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(footnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(footnote, "<<ProfessionalTitle>>", "Skeptic");
}
//traverse all endnotes
for (XWPFEndnote endnote : document.getEndnotes()) {
printAllTextInTextRunsOfIBody(endnote);
replaceTextInTextRunsOfIBody(endnote, "<<Firstname>>", "Axel");
replaceTextInTextRunsOfIBody(endnote, "<<Lastname>>", "Richter");
replaceTextInTextRunsOfIBody(endnote, "<<ProfessionalTitle>>", "Skeptic");
}
//since document was opened from *.dotx the content type needs to be changed
document.getPackage().replaceContentType(
"application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");
//Write the changes to the new docx file and close the document
document.write(outputStream);
outputStream.close();
document.close();
}
public static void main(String[] args) throws Exception {
WordEditAllIBodys app = new WordEditAllIBodys();
app.generateDocX();
}
}
Btw.: Since your document was opened from *.dotx
the content type needs to be changed from wordprocessingml.template
to wordprocessingml.document
. Else Word will not open the resulting *.docx
document. See Converting a file with ".dotx" extension (template) to "docx" (Word File).
As I am skeptic about the replacing-placeholder-text-approach, my preferred way is filling forms. See Problem with processing word document java. Of course such form fields cannot be used in header or footer. So headers or footers schould be created form scratch at whole.
Upvotes: 3