Reputation: 12861
I want to replace inner text of HTML tags with another text.
I am using HtmlAgilityPack
I use this code to extract all texts
HtmlDocument doc = new HtmlDocument();
doc.Load("some path")
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']")) {
// How to replace node.InnerText with some text ?
}
But InnerText is readonly. How can I replace texts with another text and save them to file ?
Upvotes: 32
Views: 26449
Reputation: 19661
The HtmlTextNode
class has a Text
property* which works perfectly for this purpose.
Here's an example:
var textNodes = doc.DocumentNode.SelectNodes("//body//text()").Cast<HtmlTextNode>();
foreach (var node in textNodes)
{
node.Text = node.Text.Replace("foo", "bar");
}
And if we have an HtmlNode
that we want to change its direct text, we can do something like the following:
HtmlNode node = //...
var textNode = (HtmlTextNode)node.SelectSingleNode("text()");
textNode.Text = "new text";
Or we can use node.SelectNodes("text()")
in case it has more than one.
* Not to be confused with the readonly InnerText
property.
Upvotes: 8
Reputation: 1191
Strange, but I found that InnerHtml isn't readonly. And when I tried to set it like that
aElement.InnerHtml = "sometext";
the value of InnerText
also changed to "sometext"
Upvotes: 17
Reputation: 22478
Try code below. It select all nodes without children and filtered out script nodes. Maybe you need to add some additional filtering. In addition to your XPath expression this one also looking for leaf nodes and filter out text content of <script>
tags.
var nodes = doc.DocumentNode.SelectNodes("//body//text()[(normalize-space(.) != '') and not(parent::script) and not(*)]");
foreach (HtmlNode htmlNode in nodes)
{
htmlNode.ParentNode.ReplaceChild(HtmlTextNode.CreateNode(htmlNode.InnerText + "_translated"), htmlNode);
}
Upvotes: 24