Reputation: 5
I am stuck with idea on creating proper CSV from an html table. I am using HTMLAgilityPack to read the html from string and create a HTMLDocument. Then I am using XPATH to loop through rows and columns.
The problem is that I am unable to determine the correct row and cell(x,y) for a particular cell.
Example HTML:
<html>
<body>
<table border="1">
<tr>
<td rowspan="2">
100
</td>
<td>
200
</td>
<td colspan="2">
300
</td>
</tr>
<tr>
<td colspan="2">
400
</td>
<td>
600
</td>
</tr>
<tr>
<td>
400
</td>
<td>
500
</td>
<td>
600
</td>
</tr>
</table>
</body>
</html>
When I open it in excel and save as CSV, I do get the desired output, which is:
100,200,300,
,400,,600
400,500,600,
Can someone help me create the same output in .Net respecting the rowpan and colspan?
Thanks! Dex
Upvotes: 0
Views: 1105
Reputation: 1769
You don't need to know which row and column are you on. All you need to do is add a "," for each new column you found and a breakline every time you reach the end of a row.
If you navigate through the document considering it an xml document all you have to do is go through all TR nodes adding a breakline when you reach the end of the child nodes list. And iterate through all TD nodes on each TR node adding a "," when necessary.
Upvotes: 2