Reputation: 426
I have xml document that I need to parse, but I am stuck, I may say at the very beggining. Here is part of xml file.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
I want to print out element tags only. I do it with this piece of code form python docs. I issue these commands at python interpreter.
tree = ET.parse('pom.xml')
root = tree.getroot()
root = ET.fromstring(data)
root.tag
root.tag returns this
{http://maven.apache.org/POM/4.0.0}project
Is expected result just
project
?
Upvotes: 0
Views: 64
Reputation: 169398
Python is parsing your XML in a way that keeps the declared namespaces and thus does not lose data, so the expected result is not just project
:)
The {http://maven.apache.org/POM/4.0.0}project
you see is a namespace-qualified name for the tag.
Even if the tag start <project
does not contain a namespace prefix, the immediately following xmlns="http://maven.apache.org/POM/4.0.0"
attribute declares every tag that has no explicit namespace prefix to belong into that namespace.
If you absolutely want a non-namespace-qualified name, you can of course do tag_name = element.tag.split("}", 1)[-1]
. (This should be safe for non-namespace-qualified names due to the -1
indexing.)
And of course you can recursively walk an ElementTree tree and replace all tag.name
s with their non-namespace-qualified names with the above expression if you really really want to.
Upvotes: 1