Poison Poison
Poison Poison

Reputation: 11

HtmlAgilityPack: Given a html file, how can I get nodes based on given class attributes

So basically I want to filter out the HTML and preserve the hierarchy of the nodes. For example, I have this and I only want the HTML that has the class "b.1.1" in its hierarchy:


<html>
 <div class="a">
 </div>
 <div class="b">
     <div class="b.1">
           <div class="b.1.1">
              <span>me</span>
           </div>
           <div class="b.1.2">
           </div>
     </div>
 </div>
 <div class="c">
 </div>
</html>

The result should be:


<html>
 <div class="b">
     <div class="b.1">
           <div class="b.1.1">
              <span>me</span>
           </div>
     </div>
 <div>
</html>

Any ideas?

Upvotes: 1

Views: 400

Answers (1)

Dragos Durlut
Dragos Durlut

Reputation: 166

You could write a recursive function, that goes all the way up to the parent node:

private HAP.HtmlNode FindParentNodeThatContainsClass(string classToFind, HAP.HtmlNode node)
{
    string xPath = string.Format("//*[contains(@class,'{0}')]", classToFind);
    if ( node.SelectNodes(node.XPath + "//" + xPath ) != null && node.SelectNodes(node.XPath + "//" + xPath ).Count() >= 1)
    {
        return node;
    }
    else
    {
        if (node.ParentNode != null)
        {
            var parentNode = FindParentNodeThatContainsClass(xPath , node.ParentNode);
            return parentNode;
        }
        else
        {
            return null;
        }
    }
}

I haven't tested the function, but that should get you started.

Upvotes: 1

Related Questions