Reputation: 387
I am trying to Parse an XML file using elemenTree of Python. The xml file is like below:
<App xmlns="test attribute">
<name>sagar</name>
</App>
Parser Code:
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def parser():
eleTree = etree.parse('app.xml')
eleRoot = eleTree.getroot()
print("Tag:"+str(eleRoot.tag)+"\nAttrib:"+str(eleRoot.attrib))
if __name__ == "__main__":
parser()
Output:
[sagar@linux Parser]$ python test.py
Tag:{test attribute}App <------------- It should print only "App"
Attrib:{}
When I remove "xmlns" attribute or rename "xmlns" attribute to something else the eleRoot.tag is printing correct value. Why can't element tree unable to parse the tags properly when I have "xmlns" attribute in the tag. Am I missing some pre-requisite to parse an XML of this format using element tree?
Upvotes: 0
Views: 1601
Reputation: 1101
Your xml uses the attribute xmlns
, which is trying to define a default xml namespace. Xml namespaces are used to solve naming conflicts, and require a valid URI for their value, as such the value of "test attribute"
is invalid, which appears to be troubling the parsing of your xml by etree
.
For more information on xml namespaces see XML Namespaces at W3 Schools.
Edit:
After looking into the issue further it appears that the fully qualified name of an element when using a python's ElementTree
has the form {namespace_url}tag_name
. This means that, as you defined the default namespace of "test attribute", the fully qualified name of your "App" tag is infact {test attribute}App
, which is what you're getting out of your program.
Upvotes: 2