yqit
yqit

Reputation: 682

HtmlAgilityPack reading data from an html page

I am trying to get data from an html table using html agility pack but keep getting only the data in the first table row.

The html code I am reading from is the following:

<div id="mainDiv">
    <table id="tbl">
        <thead>
            <tr>
                <th class="tbl_col1">UserName</th>
                <th class="tbl_col2">Points</th>
            </tr>
        </thead>
        <tbody>     
          <tr data-source="provider1">
            <td class="tbl_col1">
                <a href="/Users/1090" id="UserLink" target="_blank">UserName1</a>           
            </td>
            <td class="tbl_col2">
                <a href="/UserPoints/1090" id="PointLink" target="_blank">1892 <span class="up_arrow">&nbsp;</span></a>             
            </td>           
          </tr>
          <tr data-source="provider2">
            <td class="tbl_col1">
                <a href="/Users/1090" id="UserLink" target="_blank">UserName2</a>           
            </td>
            <td class="tbl_col2">
                <a href="/UserPoints/1090" id="PointLink" target="_blank">3217 <span class="down_arrow">&nbsp;</span></a>               
            </td>           
         </tr>
        </tbody>
    </table>
</div>  

I am using this code

var UserTable = htmlDocument.DocumentNode.SelectSingleNode("//div[@id='mainDiv']").SelectSingleNode("//table[@id='tbl']").SelectSingleNode("//tbody").SelectNodes("//tr");
foreach (var row in UserTable)
{
    if (row.Attributes["data-source"] != null)
    {
        string Source = row.Attributes["data-source"].Value;
        string UserName = row.SelectSingleNode("td[@class='tbl_col1']").SelectSingleNode("//a[@id='UserLink']/text()").InnerText;
        string Points = row.SelectSingleNode("td[@class='tbl_col2']").SelectSingleNode("//a[@id='PointLink']/text()").InnerText;
        Console.WriteLine(Source + "\t" + UserName + "\t" + Points);
    }
}

But I keep getting this output:

provider1       UserName1       1892
provider2       UserName1       1892

Upvotes: 0

Views: 1412

Answers (1)

Oleks
Oleks

Reputation: 32343

You made wrong assumptions: //a[@id='UserLink']/text() and //a[@id='PointLink']/text() searches in the entire document. That's why you get the first tr node. Just use:

string UserName = row.SelectSingleNode("td[@class='tbl_col1']/a[@id='UserLink']/text()").InnerText;
string Points = row.SelectSingleNode("td[@class='tbl_col2']/a[@id='PointLink']/text()").InnerText;

Also you can really simplify the rest of your code:

var UserTable = doc.DocumentNode.SelectNodes("//div[@id='mainDiv']/table[@id='tbl']/tbody/tr");
foreach (var row in UserTable)
{
    if (row.Attributes["data-source"] != null)
    {
        string Source = row.Attributes["data-source"].Value;
        string UserName = row.SelectSingleNode("td[@class='tbl_col1']/a[@id='UserLink']/text()").InnerText;
        string Points = row.SelectSingleNode("td[@class='tbl_col2']/a[@id='PointLink']/text()").InnerText;
        Console.WriteLine(Source + "\t" + UserName + "\t" + Points);
    }
}

Upvotes: 2

Related Questions