Reputation: 3067
I'm using HTMLAgility Pack to do some on-the-fly modification of HTML output -- finding all text nodes and replacing them:
const string xpath = "//*[not(self::script or self::style)]/text()[normalize-space(.) != '']";
var docNodes = doc.DocumentNode.SelectNodes(xpath).ToList();
foreach (var htmlNode in nodes)
{
var parent = htmlNode.ParentNode;
var newNode = new HtmlNode(HtmlNodeType.Text, doc, 0){InnerHTML = "Test"};
parent.ReplaceChild(newNode, htmlNode);
}
But this seems to cause a problem if the textnode isn't the only child of the parent. For example:
<label>Email:<br><input name="txtID" type="text" id="txtID" class="input"></label>
After being replaced, accessing doc.DocumentNode.OuterHTML results in the following exception: Unable to cast object of type 'HtmlAgilityPack.HtmlNode' to type 'HtmlAgilityPack.HtmlTextNode'.
Am I doing the replacement incorrectly? I can't really go and "clean up" all the original HTML documents that might run through this thing.
Upvotes: 1
Views: 886
Reputation: 32333
It seems this is a problem with inconsistency between HtmlNode(HtmlNodeType, HtmlDocument, int)
constructor you used and a way InnerHtml
and InnerText
methods work. HtmlNode
constructor creates a node of type HtmlNode
(but sets the type of the node to the value passed). When you want to get InnerHtml
or InnerText
of this node, AgilityPack performs something like this:
case HtmlNodeType.Text:
html = ((HtmlTextNode)this).Text;
which actually causes an InvalidCastException
you mentioned.
To avoid this I recommend to use another way of creating text nodes by using HtmlDocument.CreateTextNode()
method:
foreach (var htmlNode in nodes)
{
var parent = htmlNode.ParentNode;
var newNode = doc.CreateTextNode();
newNode.InnerHtml = "Test";
parent.ReplaceChild(newNode, htmlNode);
}
This will replace your text nodes correctly.
Upvotes: 5