Reputation: 23122
I have an xml file that has elements that look like gnc:account
(it's a gnucash
accounts file). I want to find all elements with that name.
However, if I do this;
for account in tree.iter('gnc:account'):
print(account)
I get nothing printed. Instead I have written this ridiculous piece of code:
def n(string):
pair = string.split(':')
return '{{{}}}{}'.format(root.nsmap[pair[0]], pair[1])
And now I can do this:
for account in tree.iter(n('gnc:account')):
print(account)
which works.
Is there a non-ridiculous solution to this problem? I'm not interested in writing out the full URI.
Upvotes: 0
Views: 129
Reputation: 22647
What you have now certainly is too hackish, in my opinion.
Solution with XPath
You could use XPath, and register this namespace URI and prefix:
>>> from io import StringIO
>>> s = """<root xmlns:gnc="www.gnc.com">
... <gnc:account>1</gnc:account>
... <gnc:account>2</gnc:account>
... </root>"""
>>> tree = etree.parse(StringIO(s))
# show that without the prefix, there are no results
>>> tree.xpath("//account")
[]
# with an unregistered prefix, throws an error
>>> tree.xpath("//gnc:account")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/etree.pyx", line 2287, in lxml.etree._ElementTree.xpath
File "src/lxml/xpath.pxi", line 359, in lxml.etree.XPathDocumentEvaluator.__call__
File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Undefined namespace prefix
# correct way of registering the namespace
>>> tree.xpath("//gnc:account", namespaces={'gnc': 'www.gnc.com'})
[<Element {www.gnc.com}account at 0x112bdd808>, <Element {www.gnc.com}account at 0x112bdd948>]
Sticking with tree.iter()
If you still would like to call iter()
in this fashion, you would need to follow lxml's advice on using namespaces with iter, for instance:
>>> for account in tree.iter('{www.gnc.com}account'):
... print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>
And if you absolutely want to avoid writing out the namespace URI or registering the namespace (which I do not think is a valid argument, it is quite easy and more clear), you could also use
>>> for account in tree.iter('{*}account'):
... print(account)
...
<Element {www.gnc.com}account at 0x112bdd808>
<Element {www.gnc.com}account at 0x112bdd948>
Upvotes: 2