Fuzz Evans
Fuzz Evans

Reputation: 2943

How to get href elements and attributes for each node?

I am working on a project that should read html, and find find all nodes that match a value, then find elements and attributes of the located nodes. I am having difficulty figuring out how to get the href attributes and elements though.

I am using HTMLAgilityPack. I have numerous nodes of

class="middle"

throughout the html. I need to get all of them, and from them, get the href element and attributes. Below is a sample of the html:

<div class="top">
        <div class="left">            
                <a href="item123">
                    <img src="url.png" border="0" />
                                    </a>
            </div>
        </div>
<div class="middle">
            <div class="title"><a href="item123">Captains Hat</a></div>

                            <div class="day">monday</div>

            <div class="city">Tuscon, AZ | 100 Days | <script typs="text/javascript">document.write(ts_to_age_min(1445620427));</script></div>

</div>

I have been able to get the other attributes I need, but not for 'href'. Here is the code I have:

List<string> listResults = new List<string>();         
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(url);                      

//get each listing                       
foreach (HtmlNode node in doc.DocumentNode.Descendants("div").Where(d =>
                d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("middle")))
            {                
string day = node.SelectSingleNode(".//*[contains(@class,'day')]").InnerHtml; 
string city = node.SelectSingleNode(".//*[contains(@class,'city')]").InnerHtml;
string item = node.SelectSingleNode("//a").Attributes["href"].Value;

listResults.Add(day + EnvironmentNewline 
+ city + EnvironmentNewline 
+ item + EnvironmentNewline + EnvironmentNewline)
}

My code above though is giving me the first href value for the whole html page though, and is giving it for each node for some reason (visible by outputting the list to a messagebox). I thought being in my foreach loop that using SelectSingleNode should get the first href attribute for that specific node. If so, why am I getting the first href attribute for the whole html page loaded?

I've been going through lots of threads on here about getting href values with HTLMAgilityPack, but I haven't been able to get this to work.

How can I get the href attribute and elements for each node I'm selecting based off the class attribute (class="middle")?

Upvotes: 1

Views: 5973

Answers (1)

Calvin
Calvin

Reputation: 61

Try replacing

 string item = node.SelectSingleNode("//a").Attributes["href"].Value;

with

 string item = node.SelectSingleNode(".//a").Attributes["href"].Value;

Other than that, code above works for me.

Alternatively:

string item = node.SelectSingleNode(".//*[contains(@class,'title')]")
              .Descendants("a").FirstOrDefault().Attributes["href"].Value; 

Upvotes: 2

Related Questions