Reputation: 8302
I'm trying to scrape an HTML table full of data on a website. Unfortunately, the source code for the table looks like this:
<table border="1" cellspacing="0" cellpadding="3">
<tr>
<td bgcolor="silver"><font face="arial,helvetica" size="1">Last Name</font></td>
<td bgcolor="silver"><font face="arial,helvetica" size="1">First Name</font></td>
<td bgcolor="silver"><font face="arial,helvetica" size="1">Middle</font></td>
</tr>
<td valign="top"><font face="arial,helvetica" size="1">
Data</font></td>
<td valign="top"><font face="arial,helvetica" size="1">
Data</font></td>
<td valign="top"><font face="arial,helvetica" size="1">
Data</font></td>
</tr>
<td valign="top"><font face="arial,helvetica" size="1">
More Data</font></td>
<td valign="top"><font face="arial,helvetica" size="1">
More Data</font></td>
<td valign="top"><font face="arial,helvetica" size="1">
More Data</font></td>
</tr>
</table>
Note the lack of staring "tr" tags for each row after the header. The table shows up fine in a browser, but the html agility pack will not recognized the tr elements with no start tag. Is there anyway I can get the html agility pack to fix this issue? Id rather not insert the tr tags myself, but will if I have to.
Upvotes: 1
Views: 983
Reputation: 116098
You can try to parse the td
s and group them by 3 items,
var list = doc.DocumentNode.Descendants("td")
.Select((td, i) => new { td, i })
.GroupBy(x => x.i / 3)
.Select(g => g.Select(t => t.td.InnerText).ToList())
.ToList();
Upvotes: 2