Reputation: 14736
I have many tables in this format:
<table class="DataRows" frame="myFrames" rules="Standard" width="100%">
<colgroup><col width="70" align="CENTER">
<col width="200" align="LEFT">
<col width="80" align="LEFT">
<col align="LEFT">
<col align="RIGHT">
</colgroup><thead>
<col width="70" align="CENTER">
<col width="200" align="LEFT">
<col width="80" align="LEFT">
<col align="LEFT">
<col align="RIGHT">
<thead>
<tr>
<td valign="TOP"><span class="classicBold"> 20 </span> Kg.
<td class="BOLD" valign="TOP" nowrap="">
PA Passion Foods Inc.
<td class="BOLD">Fax:
<td>
222-555666
<td class="BOLD">
Processed foods and juices
<tr>
<td><a target="_blank" href="">See on Map </a>
<td>
120 NW 157TH AVE
<td class="BOLD">Warehouse Hours:
<td colspan="2">
<tr>
<td>
<td><span class="BOLD">
Jacksonville,
</span>
FL 300000
<td class="BOLD">Url:
<td colspan="2">
<a target="_blank" href="">PA Passion</a>
  
<span class="BOLD">E-mail:</span>
[email protected]
<tr>
<td>
<td class="REDBOLD" colspan="4">
<tr>
<td>
<td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
Nutrella
</span>
<tr>
<td>
<td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
<tr>
<td>
<td colspan="4" align="LEFT"><span class="BOLD">
</span>
<tr>
<td>
<td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>
<tr>
<td>
<td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>
</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
</thead>
</table>
I am using HtmlAgilityPack to loop thru each of the tables using this code
foreach (HtmlNode node in htmlAgilityPackDoc.DocumentNode.SelectNodes("//table[contains(@class,'DataRows')]"))
{
}
This gives me the entire node for each iteration one of which is the table as above. I tried to access the company name in each iteration using the code below.
string str= node.ChildNodes.Descendants() .SelectSingleNode("//td[@class='BOLD']").InnerText
but all I got was the company name of the first table for every table that is extracted in the loop. How do I get the next company name and address when I go thru each table in the loop?
Upvotes: 0
Views: 122
Reputation: 89295
This is a common mistake when one trying to do a relative XPath starting with //
axis. Despite you're calling SelectSingleNode()
from node
variable, the XPath is still considered global, which mean it is relative to the root element of the XML. That's why you always get the same element every time, it is the first matched element in the entire XML.
To make the XPath scope local within current node
element, simply put a single dot (.
) at the beginning of the XPath :
string str = node.SelectSingleNode(".//td[@class='BOLD']")
.InnerText;
Upvotes: 1
Reputation: 439
node.SelectSingleNode(By.Xpath(.//td[@class='BOLD'])).Innertext
This might work. As said in a comment, using HAP should XPath used as an "extension" from former xpath start with "."- current node if i remember correctly
Upvotes: 0