user972590
user972590

Reputation: 251

Issue with using xpath to read the xhtml tags

I am using xpath to read the xhtml document, i want to read the all elements inside the <p> tag of the xhtml file. For that i am doing something like this.

XPath xpath = XPathFactory.newInstance().newXPath();                
XPathExpression expr = xpath.compile("//p[2]/*");                 
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
    System.out.println("Nodes>>>>>>>>"+nodes.item(i).getNodeValue());
}

XHMTL sample looks like this..

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head><title>test</title></head>
    <body>
        <p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc</span> </p> 
        <p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc1</span> </p>
        <p class="default"> <span style="color: #000000; font-size: 12pt; font-family: sans-serif"> Test Doc2</span> </p>
    </body>
</html>

But I am unable to get the nodes inside the <p> tag, not not able to enter into the for loop.

Can anybody will help me out in solving this issue.

Thanks in advance

Upvotes: 1

Views: 778

Answers (3)

gioele
gioele

Reputation: 10205

You could use XPathAPI (javadoc) to extract your nodes as a generic Java list.

String expr = "//p[2]/*";

Map<String, String> ns = new Map<String, String>;
ns.put("html", "http://www.w3.org/1999/xhtml");

List<String> nodeValues = XPathAPI.html.selectNodeListAsStrings(doc, expr, ns);
for (String nodeValue : nodesValues) {
    System.out.println("Nodes>>>>>>>> " + nodeValue);
}

or

List<String> nodeValues = XPathAPI.html.selectListOfNodes(doc, expr, ns);
for (Node node : nodes) {
    System.out.println("Nodes>>>>>>>> " + node.getTextContent());
}

Disclaimer: I am the author of the XPathAPI library.

Upvotes: 0

Kris
Kris

Reputation: 8868

       XPathExpression expr = xpath.compile(".//*[local-name()='p'][@id='ur_id']");               

Can you check this? I think this will get you your node. It will be nice to visit http://saxon.sourceforge.net/saxon6.5/expressions.html and understand the basics of XPath in parsing.

Upvotes: 1

Alohci
Alohci

Reputation: 82986

Your code is trying to print the nodeValues of Element nodes, which is unlikely to be what you want. I expect you want the nodeValue of Text nodes.

Another problem may be namespacing. It looks like your xpath is trying to match p elements in no namespace, when it should probably be trying to match p elements in the http://www.w3.org/1999/xhtml namespace.

Upvotes: 0

Related Questions