MartinS
MartinS

Reputation: 751

Parse html table using LINQ and HtmlAgilityPack

I want to parse date, link text and link href from table class='nice' on web page http://cslh.cz/delegace.html?id_season=2013

I have created object DelegationLink

public class DelegationLink
{
   public string date { get; set; }
   public string link { get; set; }
   public string anchor { get; set; }
}

and used it with LINQ to create List of DelegationLink

var parsedValues =
from table in htmlDoc.DocumentNode.SelectNodes("//table[@class='nice']")
from date in table.SelectNodes("tr//td")
from link in table.SelectNodes("tr//td//a")
   .Where(x => x.Attributes.Contains("href"))
select new DelegationLink
{
   date = date.InnerText,
   link = link.Attributes["href"].Value,
   anchortext = link.InnerText,
};
return parsedValues.ToList();

which takes date column ony by one and combine it with link column in every row, but i just want to simply take every row in table and get date, href and hreftext from that row. I am new to LINQ and i used google for a 4 hours without any effect. Thanks for the help.

Upvotes: 1

Views: 3348

Answers (1)

shriek
shriek

Reputation: 5197

Well, that's rather easy, you just have to select the tr's in the SelectNodes function calls and adjust your code a bit. Something like this.

var parsedValues = htmlDoc.DocumentNode.SelectNodes("//table[@class='nice']/tr").Skip(1)
.Select(r =>
      {
        var linkNode = r.SelectSingleNode(".//a");
        return new DelegationLink()
                  {
                    date = r.SelectSingleNode(".//td").InnerText,
                    link = linkNode.GetAttributeValue("href",""),
                    anchor = linkNode.InnerText,
                  };
      }
);
return parsedValues.ToList();

Upvotes: 4

Related Questions