Matt Phillips
Matt Phillips

Reputation: 11519

dom4j XPath not working parsing xhtml document

I'm trying to use dom4j to parse an xhtml document. If I simply print out the document I can see the entire document so I know it is being loaded correctly. The two divs that I'm trying to select are at the exact same level in the document.

html
  body
    div
     table
      tbody
       tr
        td
         table
           tbody
            tr
             td
              div class="definition"
              div class="example"

My code is

List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");

but the list is empty when i do System.out.println(list);

If i only do List<Element> list = document.selectNodes("//html"); it does actually return a list with one element in it. So I'm confused about whats wrong with my xpath and why it won't find those divs

Upvotes: 1

Views: 3103

Answers (3)

bobmarksie
bobmarksie

Reputation: 3636

An alternative could be: -

//div[@class='definition' or @class='example']

This searches for "div" elements, anywhere in the document with "class" attributes values equal to "definition" or "example".

I find this approach more clearly illustrates what you are trying to retrieve from the page. An added benefit is if the structure of the page changes, but the div classes stay the same, then your xpath doesn't need to be updated.

You can also check your xpath works against an HTML document using the following firefox plugin which is very useful.

Firefox Plugin - XPath Checker 0.4.4

Upvotes: 1

Rodney Gitzel
Rodney Gitzel

Reputation: 2710

What about just "//div"? Or "//html/body/div/table/tbody"? I've found long literal XPath expressions hard to debug, as it's easy for my eyes to get tricked... so I break them down until it DOES work and then build back up again.

Upvotes: 1

Andre Holzner
Andre Holzner

Reputation: 18675

Try declaring the xhtml namespace to the xpath, e.g. bind it to the prefix x and use //x:html/x:body... as XPath expression (see also this article which is however for Groovy, not for plain Java). Probably something like the following should do it in Java:

DefaultXPath xpath = new DefaultXPath("//x:html/x:body/...");
Map<String,String> namespaces = new TreeMap<String,String>();
namespaces.put("x","http://www.w3.org/1999/xhtml");
xpath.setNamespaceURIs(namespaces);

list = xpath.selectNodes(document);

(untested)

Upvotes: 3

Related Questions