Parsing XML with lxml and elementtree

Question

I'm trying to parse XML document to return nodes that contain a ref attribute. A toy example works but the document itself returns an empty array, when it should show a match.

toy example

import elementtree.ElementTree
from lxml import etree
tree = etree.XML('CatsDogsBirds')
# I can return the relevant input nodes with:
print len(tree.findall(".//input[@ref]"))
2

But working with the following (reduced) file for some reason fails:

example.xml



  
    A title
  
  
    
      Group 1
      
        Field 1

script

import elementtree.ElementTree
from lxml import etree
with open ("example.xml", "r") as myfile:
  xml = myfile.read()
tree = etree.XML(xml)
print len(tree.findall(".//input[@ref]"))
0

Any idea why this fails, and how to workaround? I think it may have something to do with the XML header. Very grateful for any assistance.

sideshowbarker · Accepted Answer

I think the problem is that the elements in your entire document are in particular namespaces, so that the un-namespaced .findall(".//input[@ref]")) expression doesn't match the input element in the document, which is actually a namespaced input element, in the http://www.w3.org/2002/xforms namespace.

So maybe try this:

.findall(".//{http://www.w3.org/2002/xforms}input[@ref]")

Updated after my original answer, to use the xforms namespace instead of the xhtml namespace (as had been noted in another answer).

Parsing XML with lxml and elementtree

Answers (2)

Related Questions