Reputation: 728
I have XML like this:
<?xml version="1.0" ?>
<iq id="123" to="test" type="result">
<query xmlns="jabber:iq:roster">
<item jid="foo" subscription="both"/>
<item jid="bar" subscription="both"/>
</query>
</iq>
And I would like to parse jid from item into array. I thought something like this would work
import xml.etree.ElementTree as ET
myarr = []
xml = '<?xml version="1.0" ?><iq id="123" to="test" type="result"><query xmlns="jabber:iq:roster"><item jid="foo" subscription="both"/><item jid="bar" subscription="both"/></query></iq>'
root = ET.fromstring(xml)
for item in root.findall('query'):
t = item.get('jid')
myarr.append(t)
print (t)
Upvotes: 0
Views: 2711
Reputation: 22453
I endorse @alecxe's approach, which I will label "handle the namespaces." That is the most general and correct approach. Unfortunately, namespaces are often ugly, wordy, and they needlessly complexity XPath expressions.
For the many simple cases where namespaces are an artifact of the XML world's desire for über-precision and not truly necessary to identify the nodes in a document, a simpler "eliminate the namespaces" alternative allows more concise searches. The key routine is:
def strip_namespaces(tree):
"""
Strip the namespaces from an ElementTree in order to make
processing easier. Adapted from @nonagon's answer
at http://stackoverflow.com/a/25920989/240490
"""
for el in tree.iter():
if '}' in el.tag:
el.tag = el.tag.split('}', 1)[1] # strip namespaces
for k, v in el.attrib.items():
if '}' in k:
newkey = k.split('}', 1)[1]
el.attrib[newkey] = v
del el.attrib[k]
return tree
Then the program continues much as before, but without worrying about those pesky namespaces:
root = ET.fromstring(xml)
strip_namespaces(root)
for item in root.findall('.//item'):
t = item.attrib.get('jid')
myarr.append(t)
print (t)
This is not effective if you are trying to modify the ElementTree and re-emit XML, but if you're just trying to deconstruct and grab data from the tree, it works well.
Upvotes: 1
Reputation: 473873
You need to handle namespaces. One option would to paste the namespace into the xpath expression:
for item in root.findall('.//{%(ns)s}query/{%(ns)s}item' % {'ns': 'jabber:iq:roster'}):
t = item.attrib.get('jid')
myarr.append(t)
print (t)
Prints:
foo
bar
See also:
Upvotes: 1