titus
titus

Reputation: 426

Python2 extracting tags from xml

I have xml document that I need to parse, but I am stuck, I may say at the very beggining. Here is part of xml file.

   <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

I want to print out element tags only. I do it with this piece of code form python docs. I issue these commands at python interpreter.

tree = ET.parse('pom.xml')
root = tree.getroot()
root = ET.fromstring(data)
root.tag

root.tag returns this

{http://maven.apache.org/POM/4.0.0}project

Is expected result just

project

?

Upvotes: 0

Views: 64

Answers (1)

AKX
AKX

Reputation: 169398

Python is parsing your XML in a way that keeps the declared namespaces and thus does not lose data, so the expected result is not just project :)

The {http://maven.apache.org/POM/4.0.0}project you see is a namespace-qualified name for the tag.

Even if the tag start <project does not contain a namespace prefix, the immediately following xmlns="http://maven.apache.org/POM/4.0.0" attribute declares every tag that has no explicit namespace prefix to belong into that namespace.

If you absolutely want a non-namespace-qualified name, you can of course do tag_name = element.tag.split("}", 1)[-1]. (This should be safe for non-namespace-qualified names due to the -1 indexing.)

And of course you can recursively walk an ElementTree tree and replace all tag.names with their non-namespace-qualified names with the above expression if you really really want to.

Upvotes: 1

Related Questions