Reputation: 669
I am new to the htmlagilitypack, I am try figure out a way which I will be able to get the links from a HTML set up like this
<div class="std"><div style="border-right: 1px solid #CCCCCC; float: left; height: 590px; width: 190px;"><div style="background-color: #eae3db; padding: 8px 0 8px 20px; font-weight: bold; font-size: 13px;">test</div>
<div>
<div style="font-weight: bold; margin: 5px 0 -6px;">FEATURED</div>
<span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat1</span></a></span>
<span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat2</span></a></span>
</div></div>
I have not wrote any code yet in c# but I was wondering whether anyone could advise what tags should point at to get the links and inner text when there are no HTML ID'. Thanks
Upvotes: 1
Views: 1874
Reputation: 3953
If you are familiar with XPATH you will be able to navigate through the elements and attributes of the html to get whatever you want. To get each href in the above you could write code as follows:
const string xpath = "/div//span/a";
//WebPage below is a string that contains the text of your example
HtmlNode html = HtmlNode.CreateNode(WebPage);
//The following gives you a node collection of your two <a> elements
HtmlNodeCollection items = html.SelectNodes(xpath);
foreach (HtmlNode a in items)
{
if (a.Attributes.Contains("href"))
//Get your value here
{
yourValue = a.Attributes["href"].Value
}
}
Note: I have not run or tested this code
Upvotes: 1