Reputation: 1479
I have a table like this:
<table border="0" cellpadding="0" cellspacing="0" id="table2">
<tr>
<th>Name
</th>
<th>Age
</th>
</tr>
<tr>
<td>Mario
</td>
<th>Age: 78
</td>
</tr>
<tr>
<td>Jane
</td>
<td>Age: 67
</td>
</tr>
<tr>
<td>James
</td>
<th>Age: 92
</td>
</tr>
</table>
and I am using html agility pack to parse it. I have tried this code but it is not returning expected results: Here is the code:
foreach (HtmlNode tr in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
{
//looping on each row, get col1 and col2 of each row
HtmlNodeCollection tds = tr.SelectNodes("td");
for (int i = 0; i < tds.Count; i++)
{
Response.Write(tds[i].InnerText);
}
}
I am getting each column because I would like to do some processing on the contents returned.
What am I doing wrong?
Upvotes: 0
Views: 8336
Reputation: 17380
This is my solution. Please notice your HTML is not well formatted because you have TH
where TD
should be:
<table border="0" cellpadding="0" cellspacing="0" id="table2">
<tr>
<th>Name
</th>
<th>Age
</th>
</tr>
<tr>
<td>Mario
</td>
<td>Age: 78
</td>
</tr>
<tr>
<td>Jane
</td>
<td>Age: 67
</td>
</tr>
<tr>
<td>James
</td>
<td>Age: 92
</td>
</tr>
</table>
And this is the c# Code:
using HtmlAgilityPack;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.Load("page.html");
List<HtmlNode> x = document.GetElementbyId("table2").Elements("tr").ToList();
foreach (HtmlNode node in x)
{
List<HtmlNode> s = node.Elements("td").ToList();
foreach (HtmlNode item in s)
{
Console.WriteLine("TD Value: " + item.InnerText);
}
}
Console.ReadLine();
}
}
}
Screenshot:
Edit: I must add that if you are going to use the <th>
tags you must include them inside a <thead>
tag, and then your rows inside of a <tbody>
tag so that your html is well formatted :)
More info: http://www.w3schools.com/tags/tag_thead.asp
Upvotes: 0
Reputation: 23
You can grab the cell content from within your outer foreach loop:
foreach (HtmlNode td in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td"))
{
Response.Write(td.InnerText);
}
Also I'd recommend trimming and 'de-entitizing the inner text to ensure it is clean:
Response.Write(HtmlEntity.DeEntitize(td.InnerText).Trim())
In your source the cells for [Age: 78] and [Age: 92] have a <th>
tag at the start instead of <td>
Upvotes: 1