Reputation: 23
Let's say inside an html page there are these three anchors. Using htmlunit i want to get the numbers inside these anchors (as numbers not as text).
<a class="someclass" href="http://someaddress1.com">3.14</a>
<a class="someclass" href="http://someaddress2.com">1.22</a>
<a class="someclass" href="http://someaddress3.com">6.66</a>
The job has to be done by the following testXPath method :
public static void testXPath () {
WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(false);
webClient.setCssEnabled(false);
try {
final HtmlPage page = (HtmlPage) webClient.getPage("pageurl");
String XPath="//a[@class='someclass']/number()";
List<Object> list = (List<Object>) page.getByXPath(XPath);
for (Objects : list) {
System.out.println(s);
}
} catch (Exception e) {
e.printStackTrace();
}
}
When i run this i get :
java.lang.RuntimeException: Could not retrieve XPath
Caused by: javax.xml.transform.TransformerException: Unknown nodetype: number
The same error occurs when i want to get only the href values (as String). In this case :
String XPath="//a[@class='someclass']/@href/string()";
But when,
String XPath="string(//a[@class='someclass']/@href)";
i get only the first href value http://someaddress1.com
I know i can get those numbers as string and then parse them as Double
List<DomText> list = (List<DomText>) page.getByXPath("//a[@class='someclass']/text()");
for (DomText d : list) {
System.out.println(Double.parseDouble(list.get(i).toString()));
}
and i can use .getValue() to get the hrefs
List<DomAttr> list = (List<DomAttr>) page.getByXPath("//a[@class='someclass']/@href");
for (DomAttr d : list) {
System.out.println(list.get(i).getValue());
}
but that is not the case. I want to use XPath functions to do that (i'm guessing it's faster).
Upvotes: 0
Views: 1442
Reputation: 43434
As Martin said, this is an XPath 2.0 feature. HtmlUnit does not currently support XPath 2.0. This means you can not use that expression.
I would recommend to workaround it by adding the parsing outside XPath. It doesn't look that bad and it is actually the only way to go. Of course, you could extract that into some methods to perform the field extraction and parsing and it will look better.
More detail on why XPath 2.0 is not supported: Actually, it is that HtmlUnit is not supporting XPath 2.0. It is just that XPath is being handled in org.apache.xpath.*
and it currently does not support 2.0. If support for the newer XPath version is added there then you will be able to use XPath 2.0 expressions in the getByXPath
and getFirstByXPath
methods.
Upvotes: 1
Reputation: 167571
The expression //a[@class='someclass']/number()
is legal in XPath 2.0 but not in XPath 1.0 so you would need to ensure your Java application plugs in an XPath 2.0 engine like Saxon 9 if you want to use that syntax. But I doubt that the API you are using (like getByXPath
) is designed with XPath 2.0 in mind and allows you to return sequences of values for instance. JAXP allows you to plug in Saxon instead of Xalan but its API does nevertheless then not allow you to return sequences of primitive values.
So often you need to change more than the XPath engine.
Upvotes: 0