Python2 extracting tags from xml

Question

I have xml document that I need to parse, but I am stuck, I may say at the very beggining. Here is part of xml file.

I want to print out element tags only. I do it with this piece of code form python docs. I issue these commands at python interpreter.

tree = ET.parse('pom.xml')
root = tree.getroot()
root = ET.fromstring(data)
root.tag

root.tag returns this

{http://maven.apache.org/POM/4.0.0}project

Is expected result just

project

?

AKX · Accepted Answer

Python is parsing your XML in a way that keeps the declared namespaces and thus does not lose data, so the expected result is not just project :)

The {http://maven.apache.org/POM/4.0.0}project you see is a namespace-qualified name for the tag.

Even if the tag start does not contain a namespace prefix, the immediately following xmlns="http://maven.apache.org/POM/4.0.0" attribute declares every tag that has no explicit namespace prefix to belong into that namespace.



If you absolutely want a non-namespace-qualified name, you can of course do tag_name = element.tag.split("}", 1)[-1]. (This should be safe for non-namespace-qualified names due to the -1 indexing.)

And of course you can recursively walk an ElementTree tree and replace all tag.names with their non-namespace-qualified names with the above expression if you really really want to.

Python2 extracting tags from xml

Answers (1)

Related Questions