sagar
sagar

Reputation: 387

Python ElementTree unable to parse xml file correctly

I am trying to Parse an XML file using elemenTree of Python. The xml file is like below:

<App xmlns="test attribute">
    <name>sagar</name>
</App>

Parser Code:

from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def parser():
    eleTree = etree.parse('app.xml')
    eleRoot = eleTree.getroot()
    print("Tag:"+str(eleRoot.tag)+"\nAttrib:"+str(eleRoot.attrib))
if __name__ == "__main__":
    parser()

Output:

[sagar@linux Parser]$ python test.py
Tag:{test attribute}App  <------------- It should print only "App"
Attrib:{}

When I remove "xmlns" attribute or rename "xmlns" attribute to something else the eleRoot.tag is printing correct value. Why can't element tree unable to parse the tags properly when I have "xmlns" attribute in the tag. Am I missing some pre-requisite to parse an XML of this format using element tree?

Upvotes: 0

Views: 1601

Answers (1)

Jake Conkerton-Darby
Jake Conkerton-Darby

Reputation: 1101

Your xml uses the attribute xmlns, which is trying to define a default xml namespace. Xml namespaces are used to solve naming conflicts, and require a valid URI for their value, as such the value of "test attribute" is invalid, which appears to be troubling the parsing of your xml by etree.

For more information on xml namespaces see XML Namespaces at W3 Schools.


Edit:

After looking into the issue further it appears that the fully qualified name of an element when using a python's ElementTree has the form {namespace_url}tag_name. This means that, as you defined the default namespace of "test attribute", the fully qualified name of your "App" tag is infact {test attribute}App, which is what you're getting out of your program.

Source

Upvotes: 2

Related Questions