joe
joe

Reputation: 17478

How to replace text in an XML document using Java

How do I replace text in an XML document using Java?

Source:

<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a Home Owners Agreement is that...</p>
</body>

Desired output:

<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a HOA is that...</p>
</body>

I only want text in <p> tags to be replaced. I tried the following:

replaceText(string term, string replaceWith, org.w3c.dom.Node p){
       p.setTextContent(p.getTextContent().replace(term, replaceWith));

}

The problem with the above code is that all the child nodes of p get lost.

Upvotes: 1

Views: 7404

Answers (2)

joe
joe

Reputation: 17478

Okay, I figured out the solution.

The key to this is that you don't want to replace the text of the actual node. There is a actually a child representation of just the text. I was able to accomplish what I needed with this code:

private static void replace(Node root){
    if (root.getNodeType() == root.TEXT_NODE){
        root.setTextContent(root.getTextContent().replace("Home Owners Agreement", "HMO"));
    }
    for (int i = 0; i < root.getChildNodes().getLength(); i++){ 
        outputTextOfNode(root.getChildNodes().item(i));
    }
}

Upvotes: 2

AlexR
AlexR

Reputation: 115328

The problem here is that you actually want to replace node, not only the text. You can traverse the children of current node and add them again to the new node. Then replace nodes.

But it requires a lot of work and very sensitive to you document structure. For example if somebody will wrap your <p> tag with div you will have to change your parsing.

Moreover this approach is very ineffective from point of view of CPU and memory utilization: you have to parse whole document to change a couple of words in it.

My suggestion is the following: try to use regular expressions. In most cases it is strong enough. For example code like

xml.replaceFirst("(<p>.*?</p>)", "<p>The <b>good</b> thing about a HOA is that...</p>")

will work (at least in your case).

Upvotes: 1

Related Questions