ekl
ekl

Reputation: 53

Html Agility Pack parsing table into object

So I have HTML like this:

<tr class="row1">
        <td class="id">123</td>
        <td class="date">2014-08-08</td>
        <td class="time">12:31:25</td>
        <td class="notes">something here</td>
</tr>
<tr class="row0">
        <td class="id">432</td>
        <td class="date">2015-02-09</td>
        <td class="time">12:22:21</td>
        <td class="notes">something here</td>
</tr>

And it continues like that for each customer row. I want to parse contents of each table row to an object. I've tried few methods but I can't seem to get it work right.

This is what I have currently

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//table[@id='customerlist']//tr"))
{
    Customer cust = new Customer();
    foreach (HtmlNode info in row.SelectNodes("//td"))
    {
        if (info.GetAttributeValue("class", String.Empty) == "id")
        {
            cust.ID = info.InnerText;
        }
        if (info.GetAttributeValue("class", String.Empty) == "date")
        {
            cust.DateAdded = info.InnerText;
        }
        if (info.GetAttributeValue("class", String.Empty) == "time")
        {
            cust.TimeAdded = info.InnerText;
        }
        if (info.GetAttributeValue("class", String.Empty) == "notes")
        {
            cust.Notes = info.InnerText;
        }
    }
    Console.WriteLine(cust.ID + " " + cust.TimeAdded + " " + cust.DateAdded + " " + cust.Notes);
}

It works to the point that it prints info of the last row of the table on each loop. I'm just missing something very simple but cannot see what.

Also is my way of creating the object fine, or should I use a constructor and create the object from variables? E.g.

    string Notes = String.Empty;
if (info.GetAttributeValue("class", String.Empty) == "notes")
{
    Notes = info.InnerText;
}
..
Customer cust = new Customer(id, other_variables, Notes, etc);

Upvotes: 2

Views: 1424

Answers (1)

haim770
haim770

Reputation: 49095

Your XPath query is wrong. You need to use td instead of //td:

foreach (HtmlNode info in row.SelectNodes("td"))

Passing //td to SelectNodes() will match all <td> elements in the document, hence your inner loop runs 8 times instead of 4 times, and the last 4 times always overrides the values previously set in your Customer object.

See XPath Examples

Upvotes: 2

Related Questions