user10211766
user10211766

Reputation:

htmlagilitypack select nodes return null

I used this code to get the page info But now the site has changed and my application returns null error.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(page);
var query = doc.DocumentNode
  .SelectNodes("//table[@class='table table-striped table-hover']/tr")
  .Select(r => {
    return new DelegationLink()
    {
        Row = r.SelectSingleNode(".//td").InnerText,
        Category = r.SelectSingleNode(".//td[2]").InnerText
    };
}).ToList();

and this is my html:

 <div role="tabpanel" class="tab-pane fade " id="tab3">
                <div class="circular-div">
    <table class="table table-striped table-hover" id="circular-table">
        <thead>
            <tr>
                <th>ردیف</th>
                <th>دسته بندی</th>
                <th>عنوان</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>1</td>
                <td>بخشنامه‌ها</td>
                <td>اطلاعیه جهاد دانشگاهی</td>
            </tr>
            <tr>
                <td>2</td>
                <td>بخشنامه‌ها</td>
...
...
...

Where do I wrong?

Upvotes: 1

Views: 879

Answers (1)

Daniel Manta
Daniel Manta

Reputation: 6683

Table rows are not direct descendants of the table but they are nested into other tags and that's why your code was returning null. Also you want to skip the header and scrape only the body of the table.

var query = doc.DocumentNode
    .SelectNodes("//table[@class='table table-striped table-hover']/tbody/tr")
    .Select(r =>
    {
        return new DelegationLink()
        {
            Row = r.InnerText,
            Category = r.SelectSingleNode(".//td[2]").InnerText
        };
    }
).ToList();

Upvotes: 1

Related Questions