Reputation: 251
I am using xpath to read the xhtml document, i want to read the all elements inside the <p>
tag of the xhtml file. For that i am doing something like this.
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//p[2]/*");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("Nodes>>>>>>>>"+nodes.item(i).getNodeValue());
}
XHMTL sample looks like this..
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>test</title></head>
<body>
<p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc</span> </p>
<p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc1</span> </p>
<p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc2</span> </p>
</body>
</html>
But I am unable to get the nodes inside the <p>
tag, not not able to enter into the for loop.
Can anybody will help me out in solving this issue.
Thanks in advance
Upvotes: 1
Views: 778
Reputation: 10205
You could use XPathAPI (javadoc) to extract your nodes as a generic Java list.
String expr = "//p[2]/*";
Map<String, String> ns = new Map<String, String>;
ns.put("html", "http://www.w3.org/1999/xhtml");
List<String> nodeValues = XPathAPI.html.selectNodeListAsStrings(doc, expr, ns);
for (String nodeValue : nodesValues) {
System.out.println("Nodes>>>>>>>> " + nodeValue);
}
or
List<String> nodeValues = XPathAPI.html.selectListOfNodes(doc, expr, ns);
for (Node node : nodes) {
System.out.println("Nodes>>>>>>>> " + node.getTextContent());
}
Disclaimer: I am the author of the XPathAPI library.
Upvotes: 0
Reputation: 8868
XPathExpression expr = xpath.compile(".//*[local-name()='p'][@id='ur_id']");
Can you check this? I think this will get you your node. It will be nice to visit http://saxon.sourceforge.net/saxon6.5/expressions.html and understand the basics of XPath in parsing.
Upvotes: 1
Reputation: 82986
Your code is trying to print the nodeValue
s of Element nodes, which is unlikely to be what you want. I expect you want the nodeValue
of Text nodes.
Another problem may be namespacing. It looks like your xpath is trying to match p
elements in no namespace, when it should probably be trying to match p
elements in the http://www.w3.org/1999/xhtml
namespace.
Upvotes: 0