Reputation: 619
Here is my simplified HTML:
<html>
<body>
<div id="mainDiv">
<div id="divToRemove"></div>
<div id="divToKeep"></div>
<div class="divToRemove"></div>
<div class="divToRemove"></div>
</div>
</body>
</html>
I want to remove the divs with ID or class named "divToRemove" and then I want to select only the div called "mainDiv" (in a HtmlNode).
The results should be:
<div id="mainDiv">
<div id="divToKeep"></div>
</div>
How can i do that using Html Agility Pack?
Thanks!
Upvotes: 4
Views: 8286
Reputation: 12768
Personally, I prefer to use the Linq methods of HtmlAgilityPack. The select will be long, but relatively straightforward—just select the nodes with the right id and/or class and then call the Remove()
method on it.
foreach (var node in doc.DocumentNode.Descendants("div")
.Where(n => n.Id.Equals("divToRemove", StringComparison.InvariantCultureIgnoreCase)
|| n.GetAttributeValue("class", string.Empty).Equals("divToRemove", StringComparison.InvariantCultureIgnoreCase)))
node.Remove();
HtmlNode mainNode = doc.DocumentNode.Descendants("div").Where(n => n.Id.Equals("mainDiv", StringComparison.InvariantCultureIgnoreCase).First();
Upvotes: 1
Reputation: 9244
The following code is a adapted from this Html Agility Pack forum page to fit your needs. Essentially, we will grab all divs and then loop through them and check their class or their id for a match. If it's there remove it.
var divs = htmldoc.DocumentNode.SelectNodes("//div");
if (divs != null)
{
foreach (var tag in divs)
{
if (tag.Attributes["class"] != null && string.Compare(tag.Attributes["class"].Value, "divToRemove", StringComparison.InvariantCultureIgnoreCase) == 0)
{
tag.Remove();
} else if(tag.Attributes["id"] != null && string.Compare(tag.Attributes["id"].Value, "divToRemove", StringComparison.InvariantCultureIgnoreCase) == 0) {
tag.Remove();
}
}
}
You can also combine these if statements into one large if statement, but I thought this read better for the answer.
Finally, select the node you were looking for...
var mainDiv = htmldoc.DocumentNode.SelectSingleNode("//div[@id='mainDiv']");
Upvotes: 6