Reputation: 31258
E.g. consider parsing a pom.xml
file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
Code:
import xml.etree.ElementTree as ET
tree = ET.parse(pom)
root = tree.getroot()
groupId = root.find("groupId")
artifactId = root.find("artifactId")
Both groupId
and artifactId
are None
. Why when they are the direct descendants of the root? I tried to replace the root
with tree
(groupId = tree.find("groupId")
) but that didn't change anything.
Upvotes: 1
Views: 1665
Reputation: 365915
The problem is that you don't have a child named groupId
, you have a child named {http://maven.apache.org/POM/4.0.0}groupId
, because etree doesn't ignore XML namespaces, it uses "universal names". See Working with Namespaces and Qualified Names in the effbot docs.
Upvotes: 4
Reputation: 45
Just to expand on abarnert's comment about BeautifulSoup, if you DO just want a quick and dirty solution to the problem, this is probably the fastest way to go about it. I have implemented this (for a personal script) that uses bs4, where you can traverse the tree with
element = dom.getElementsByTagNameNS('*','elementname')
This will reference the dom using ANY namespace, handy if you know you've only got one in the file so there's no ambiguity.
Upvotes: 1