willa
willa

Reputation: 669

htmlagilitypack parsing links and inner text

I am new to the htmlagilitypack, I am try figure out a way which I will be able to get the links from a HTML set up like this

<div class="std"><div style="border-right: 1px solid #CCCCCC; float: left; height: 590px; width: 190px;"><div style="background-color: #eae3db; padding: 8px 0 8px  20px; font-weight: bold; font-size: 13px;">test</div>
    <div>
    <div style="font-weight: bold; margin: 5px 0 -6px;">FEATURED</div>
    <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat1</span></a></span>
     <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat2</span></a></span>
</div></div>

I have not wrote any code yet in c# but I was wondering whether anyone could advise what tags should point at to get the links and inner text when there are no HTML ID'. Thanks

Upvotes: 1

Views: 1874

Answers (1)

Harrison
Harrison

Reputation: 3953

If you are familiar with XPATH you will be able to navigate through the elements and attributes of the html to get whatever you want. To get each href in the above you could write code as follows:

 const string xpath = "/div//span/a";

 //WebPage below is a string that contains the text of your example
 HtmlNode html = HtmlNode.CreateNode(WebPage);
 //The following gives you a node collection of your two <a> elements
 HtmlNodeCollection items = html.SelectNodes(xpath);
 foreach (HtmlNode a in items)
 {    
      if (a.Attributes.Contains("href"))
      //Get your value here
      {
           yourValue = a.Attributes["href"].Value
      }
 }

Note: I have not run or tested this code

Upvotes: 1

Related Questions