zaferaltun
zaferaltun

Reputation: 141

Seperated text line in Apache POI XWPFRun object

I 'm trying to replace a template DOCX document with Apache POI by using the XWPFDocument class. I have tags in the doc and a JSON file to read the replacement data. My problem is that a text line seems separated in a certain way in DOCX when I change its extension to ZIP file and open document.xml. For example [MEMBER_CONTACT_INFO] text becomes [MEMBER_CONTACT_INFO and ] separately. POI reads this in the same way since the DOCX original is like this. This creates 2 XWPFRun objects in the paragraph which show the text as [MEMBER_CONTACT_INFO and ] separately.

My question is, is there a way to force POI to run like Word via merging related runs or something like that? Or how can I solve this problem? I 'm matching run texts while replacing and I can't find my tag because it is split into 2 different run object.

Best

Upvotes: 13

Views: 7758

Answers (5)

Yehouda
Yehouda

Reputation: 122

To be sure that a word will be consider as a single XWPFRun, You can use merge_field as variable in word like that

  1. Place cursor on the word you want to be a single run.
  2. Press CTRL and F9 together and { } in gray will appear.
  3. Right-click on the { } field and select Edit Field.
  4. In pop-up box, select Mail Merge from Categories and then MergeField from Field Names.
  5. Click OK.

Upvotes: 1

EnGoPy
EnGoPy

Reputation: 353

For me it didn't work as I expected (every time). In my case I used "${PLACEHOLDER} in the text. At first we need to take a look how Apache Poi recognize each Paragraph which we want to iterate through with Runs. If you go deeper with docx file construction you will know that one run is a sequence of characters of text with the same font style/font size/colour/bold/italic etc. That way placeholder sometimes was divided into parts OR sometimes whole paragraph was recognized as a one Run and it was impossible to iterate through words.
What I did is to bold placeholder name in a template document. Than when iterating through RUN I was able to iterate through whole placeholder name ${PLACEHOLDER}. When I replaced that value with

for (XWPFRun r : p.getRuns()) {
  String text = r.getText(0);
  if (text != null && text.contains("originalText")) {
     text = text.replace("originalText", "newText");
     r.setText(text,0);
     }
  }

I've added just r.isBold(false); after setText.
That way placeholder is recognized as a different run -> I'm able to replace specific placeholder, and in the processed document I have no bolding, just a plain text.
For me one of a additional advantage was that visualy I'm able to faster find placeholders in text. So finally above loop looks like that:

for (XWPFRun r : p.getRuns()) {
      String text = r.getText(0);
      if (text != null && text.contains("originalText")) {
         text = text.replace("originalText", "newText");
         r.setText(text,0);
         r.isBold(false);
         }
      }

I hope it will help to someone, while I spend too much time for that :)

Upvotes: 1

kiraluo163
kiraluo163

Reputation: 31

Here is the java code to fix that separate text line issue. It will also handle the mult-format string replacement.

public static void replaceString(XWPFDocument doc, String search, String replace) throws Exception{
  for (XWPFParagraph p : doc.getParagraphs()) {
    List<XWPFRun> runs = p.getRuns();
    List<Integer> group = new ArrayList<Integer>();
    if (runs != null) {
      String groupText = search;
      for (int i=0 ; i<runs.size(); i++) {
        XWPFRun r = runs.get(i);
        String text = r.getText(0);
        if (text != null)
            if(text.contains(search)) {
              String safeToUseInReplaceAllString = Pattern.quote(search);
              text = text.replaceAll(safeToUseInReplaceAllString, replace);
              r.setText(text, 0);
            }
            else if(groupText.startsWith(text)){
              group.add(i);
              groupText = groupText.substring(text.length());
              if(groupText.isEmpty()){
                runs.get(group.get(0)).setText(replace, 0);
                for(int j = 1; j<group.size(); j++){
                  p.removeRun(group.get(j));
                }
                group.clear();
                groupText = search;
              }
            }else{
              group.clear();
              groupText = search;
            }
        }
    }
}
for (XWPFTable tbl : doc.getTables()) {
   for (XWPFTableRow row : tbl.getRows()) {
      for (XWPFTableCell cell : row.getTableCells()) {
         for (XWPFParagraph p : cell.getParagraphs()) {
            for (XWPFRun r : p.getRuns()) {
              String text = r.getText(0);
              if (text.contains(search)) {
                String safeToUseInReplaceAllString = Pattern.quote(search);
                text = text.replaceAll(safeToUseInReplaceAllString, replace);
                r.setText(text);
              }
            }
         }
      }
   }
}

}

Upvotes: 2

user2009750
user2009750

Reputation: 3197

This wasted so much of my time once...

Basically, an XWPFParagraph is composed of multiple XWPFRuns, and XWPFRun is a contagious text that has a fixed same style.

So when you try writing something like "[PLACEHOLDER_NAME]" in MS-Word it will create a single XWPFRun. But if you somehow add a few things more, and then you go back and change "[PLACEHOLDER_NAME]" to something else it is never guaranteed that it will remain a single XWPFRun it is quite possible that it will split to two Runs. AFAIK this is how MS-Word works.

How to avoid splitting of Runs in such cases?

Solution: There are two solutions that I know of:

  1. Copy text "[PLACEHOLDER_NAME]" to Notepad or something. Make your necessary modification and copy it back and paste it instead of "[PLACEHOLDER_NAME]" in your word file, this way your whole "[PLACEHOLDER_NAME]" will be replaced with new text avoiding splitting of XWPFRuns.

  2. Select "[PLACEHOLDER_NAME]" and then click of MS-Word "Replace" option and Replace with "[Your-new-edited-placeholder]" and this will guarantee that your new placeholder will consume a single XWPFRun.

If you have to change your new placeholder again, follow step 1 or 2.

Upvotes: 20

Romina Milea
Romina Milea

Reputation: 1

I also had this issue few days ago and I couldn't find any solution. I chose to use PLACEHOLDER_NAME instead of [PLACEHOLDER_NAME]. This is working fine for me and it's seen like a single XWPFRun object.

Upvotes: 0

Related Questions