Reputation: 11519
I'm trying to use dom4j to parse an xhtml document. If I simply print out the document I can see the entire document so I know it is being loaded correctly. The two divs that I'm trying to select are at the exact same level in the document.
html
body
div
table
tbody
tr
td
table
tbody
tr
td
div class="definition"
div class="example"
My code is
List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");
but the list is empty when i do System.out.println(list);
If i only do List<Element> list = document.selectNodes("//html");
it does actually return a list with one element in it. So I'm confused about whats wrong with my xpath and why it won't find those divs
Upvotes: 1
Views: 3103
Reputation: 3636
An alternative could be: -
//div[@class='definition' or @class='example']
This searches for "div" elements, anywhere in the document with "class" attributes values equal to "definition" or "example".
I find this approach more clearly illustrates what you are trying to retrieve from the page. An added benefit is if the structure of the page changes, but the div classes stay the same, then your xpath doesn't need to be updated.
You can also check your xpath works against an HTML document using the following firefox plugin which is very useful.
Firefox Plugin - XPath Checker 0.4.4
Upvotes: 1
Reputation: 2710
What about just "//div"? Or "//html/body/div/table/tbody"? I've found long literal XPath expressions hard to debug, as it's easy for my eyes to get tricked... so I break them down until it DOES work and then build back up again.
Upvotes: 1
Reputation: 18675
Try declaring the xhtml namespace to the xpath, e.g. bind it to the prefix x
and use //x:html/x:body...
as XPath expression (see also this article which is however for Groovy, not for plain Java). Probably something like the following should do it in Java:
DefaultXPath xpath = new DefaultXPath("//x:html/x:body/...");
Map<String,String> namespaces = new TreeMap<String,String>();
namespaces.put("x","http://www.w3.org/1999/xhtml");
xpath.setNamespaceURIs(namespaces);
list = xpath.selectNodes(document);
(untested)
Upvotes: 3