Reputation: 3080
I have the current HTML layout
<table> //table[1]
</table>
<table> //table[2]
<tbody>
<tr>
<td>
<p>
</p>
</td>
</tr>
<tr>
<td>
<table> //table[1]//table[1]
<tbody>
<tr>
<td>
<p>
INFO 1
</p>
</td>
<td>
<p>
INFO 2
</p>
</td>
<td>
<p>
INFO 3
</p>
</td>
<td>
<p>
INFO 4
</p>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table> //table[1]//table[2]
<tbody>
<tr>
<td>
<p><strong>Name</strong></p>
</td>
<td>
<p><strong>Quantity</strong></p>
</td>
</tr>
<tr>
<td>
<p>Apples </p>
</td>
<td>10</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table> //table[1]//table[3]
</table>
</td>
</tr>
</tbody>
</table>
I am trying to get the data within //table[1]//table[2]
, yet I keep getting a null HtmlNode (System.NullReferenceException
) for the following:
doesn't' work: doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr//td//table[2]//tbody//tr");
,
I am not sure why this occurs as when I try to get data for //table[1]//table[1]
it works just fine with this syntax
works: doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr//td//table[1]//tbody//tr");
Am I misunderstanding how the indexing works with Html Agility Pack?
Upvotes: 0
Views: 1740
Reputation: 89325
//table[2]
return 2nd <table>
element within the same parent because in XPath :
The (
[]
) has a higher precedence (priority) than (//
and/
). [For Reference]
In your case, there is only one <table>
in each <td>
, therefore the Xpath expression returned nothing. One possible solution is to put brackets to alter the precedence :
(//table[2]//tbody//tr//td//table)[2]//tbody//tr
Above Xpath get 2nd <table>
element from all <table>
s returned by the inner XPath //table[2]//tbody//tr//td//table
. Then from that <table>
, continue to return descendants //tbody//tr
elements.
Upvotes: 1
Reputation: 3080
I ended up having to base this off of tr
's not sure why my other way did not work, but this way does work.
I basically moved my indexing to the next level above my table's. So within the first tbody
each table thereafter is within a tr/td statement, and I simply I constructed my HtmlNode to index off of the tr
's. Maybe Agility Pack works better if you broaden the selecting process? IDK.
Anyways...
For table[2]//table[1]
I used:
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr[2]//table");
foreach (var cell in table.SelectNodes(".//tr//td/p"))
...
I Selected tr[2] as I had a tr/td before with a blank space if you note the example HTML above
For table[2]//table[2]
I used
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[2]//tbody//tr[3]//table[1]");
foreach (var cell in table.SelectNodes(".//tr//td"))
...
For anyone having issues, try moving your search to a broader selection by pushing specific tags to broader ones.
Upvotes: 0