Reputation: 23830
I have to gather information from a product page which does not have any class or id. I am using htmlagilitypack and c# 4.0.
There are many tables at this product page source code. The prices table contains " KDV" string. So i would like to get this " KDV" string containing table. How can i do that ?
The xpath below would select all tables for example
string srxPathOfCategory = "//table";
var selectedNodes = myDoc.DocumentNode.SelectNodes(srxPathOfCategory);
The code below selects the table but starting from most outer table. I need to select most inner table which contains that given string
//table[contains(., ' KDV')]
c# , xpath , htmlagilitypack
Upvotes: 3
Views: 2345
Reputation: 3241
There might be a more efficient way to do it. Anyway, this is the entire code I have used for your case and it works for me:
HtmlDocument doc = new HtmlDocument();
string url = "http://www.pratikev.com/fractalv33/pratikEv/pages/viewProduct.jsp?pInstanceId=3138821";
using (var response = (WebRequest.Create(url).GetResponse()))
{
doc.LoadHtml(new StreamReader(response.GetResponseStream()).ReadToEnd());
}
/*There is an bug in the xpath used here. Should have been
(//table/tr/td/font[contains(.,'KDV')])[1]/ancestor::table[2]
See Dimitre's answer for an explanation and an alternative /
more generic / (needless to say) better approach */
string xpath = "//table/tr/td/font[contains(.,'KDV')][1]/ancestor::table[2]";
HtmlNode table = doc.DocumentNode.SelectSingleNode(xpath);
Upvotes: 1
Reputation: 243479
The code below selects the table but starting from most outer table. I need to select most inner table which contains that given string
Use:
//table
[not(descendant::table)
and
.//text()[contains(., ' KDV')]
]
This selects any table
in the XML document that doesn't have a table
descendant, and that has a text node descendant that contains the string " KDV"
.
In general the above expression could select many such table
elements.
If you want only one of them selected (say the first), use this XPath expression -- do notice the brackets:
(//table
[not(descendant::table)
and
.//text()[contains(., ' KDV')]
]
)[1]
Remember: If you want to select the first someName
element in the document, using this (as in the currently accepted answer) is wrong:
//someName[1]
This is the second most FAQ in XPath (after the one how to select elements with unprefixed names in an XML document with a default namespace).
The expression above actually selects any someName
element in the document, that is the first child of its parent -- try it.
The reason for this unintuitive behavior is because the XPath []
operator has a higher precedence (priority) that the //
pseudo-operator.
The correct expression that really selects only the first someName
element (in any XML document), if such exists is:
(//someName)[1]
Here the brackets are used to explicitly override the default XPath operator precedence.
Upvotes: 4