Matt D. Webb
Matt D. Webb

Reputation: 3314

Select values from elements with the particular class name

I am getting an object reference error when parsing an external html file, I think this is because not all the elements selected have the class name. Here is my code:

foreach (HtmlNode link in doc.DocumentNode.Descendants("li").Where(i => i.Attributes["class"].Value == "name"))
{
    string result = link.InnerText.Trim().Replace(" ", "");
    Console.WriteLine(result);
}

How do select only the values where I have the class name of "name"?

Here is my html code I'm trying to parse:

<li>
    <span class="name">
        <a href="/players/joe-bloggs.html">Joe,&nbsp;Bloggs</a>
    </span>

    <span class="country">
        <img src="/img/flags/15x15/USA.gif" alt="USA"/>
        United States
    </span>
</li>
<li>
    <span class="name">
        <a href="/players/joe-bloggs.html">Joe,&nbsp;Bloggs</a>
    </span>

    <span class="country">
        <img src="/img/flags/15x15/USA.gif" alt="USA"/>
        United States
    </span>
</li>
<li>
    <span class="name">
        <a href="/players/joe-bloggs.html">Joe,&nbsp;Bloggs</a>
    </span>

    <span class="country">
        <img src="/img/flags/15x15/RSA.gif" alt="RSA"/>
        South Africa
    </span>
</li>

Upvotes: 2

Views: 1206

Answers (1)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236208

You should select a elements instead of li elements. And its span element which have class attribute. I suggest you to use predicates:

var links = doc.DocumentNode.SelectNodes("//li/span[@class='name']/a");

This xpath selects all span elements which have class attribute equal to name, and then selects a element.

foreach (var a in links)
    Console.WriteLine(a.InnerText);

For your sample HTML output is:

Joe,&nbsp;Bloggs
Joe,&nbsp;Bloggs
Joe,&nbsp;Bloggs

Side note - you can use HttpUtility.HtmlDecode(a.InnerText) to get decoded text (not only &nbsp; will be replaced).


UPDATE: Parsing players

var players = from p in doc.DocumentNode.SelectNodes("//li")
              let name = p.SelectSingleNode("span[@class='name']/a")
              let country = p.SelectSingleNode("span[@class='country']")
              select new
              {
                  Name = (name == null) ? null : 
                         HttpUtility.HtmlDecode(name.InnerText.Trim()),
                  Country = (country == null) ? null :
                         HttpUtility.HtmlDecode(country.InnerText.Trim())
              };

Result:

[
  {
    Name: "Joe, Bloggs",
    Country: "United States"
  },
  {
    Name: "Joe, Bloggs",
    Country: "United States"
  },
  {
    Name: "Joe, Bloggs",
    Country: "South Africa"
  }
]

Upvotes: 3

Related Questions