eviabs
eviabs

Reputation: 619

Html Agility Pack - Remove Tags by ID Or Class

Here is my simplified HTML:

<html>
  <body>
    <div id="mainDiv">
       <div id="divToRemove"></div>
       <div id="divToKeep"></div>
       <div class="divToRemove"></div>
       <div class="divToRemove"></div>
    </div>
  </body>
</html>

I want to remove the divs with ID or class named "divToRemove" and then I want to select only the div called "mainDiv" (in a HtmlNode).

The results should be:

   <div id="mainDiv">
       <div id="divToKeep"></div>
   </div>

How can i do that using Html Agility Pack?

Thanks!

Upvotes: 4

Views: 8286

Answers (2)

Jacob Proffitt
Jacob Proffitt

Reputation: 12768

Personally, I prefer to use the Linq methods of HtmlAgilityPack. The select will be long, but relatively straightforward—just select the nodes with the right id and/or class and then call the Remove() method on it.

foreach (var node in doc.DocumentNode.Descendants("div")
    .Where(n => n.Id.Equals("divToRemove", StringComparison.InvariantCultureIgnoreCase) 
        || n.GetAttributeValue("class", string.Empty).Equals("divToRemove", StringComparison.InvariantCultureIgnoreCase)))
    node.Remove();
HtmlNode mainNode = doc.DocumentNode.Descendants("div").Where(n => n.Id.Equals("mainDiv", StringComparison.InvariantCultureIgnoreCase).First();

Upvotes: 1

scottheckel
scottheckel

Reputation: 9244

The following code is a adapted from this Html Agility Pack forum page to fit your needs. Essentially, we will grab all divs and then loop through them and check their class or their id for a match. If it's there remove it.

var divs = htmldoc.DocumentNode.SelectNodes("//div");
if (divs != null)
{
    foreach (var tag in divs)
    {
        if (tag.Attributes["class"] != null && string.Compare(tag.Attributes["class"].Value, "divToRemove", StringComparison.InvariantCultureIgnoreCase) == 0)
        {
            tag.Remove();
        } else if(tag.Attributes["id"] != null && string.Compare(tag.Attributes["id"].Value, "divToRemove", StringComparison.InvariantCultureIgnoreCase) == 0) {
            tag.Remove();
        }
    }
}

You can also combine these if statements into one large if statement, but I thought this read better for the answer.

Finally, select the node you were looking for...

var mainDiv = htmldoc.DocumentNode.SelectSingleNode("//div[@id='mainDiv']");

Upvotes: 6

Related Questions