Deleted
Deleted

Reputation: 1431

Python: Read and write namespaced XML using ElementTree

This XML file is named example.xml:

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>14.0.0</modelVersion>
  <groupId>.com.foobar.flubber</groupId>
  <artifactId>uberportalconf</artifactId>
  <version>13-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>Environment for UberPortalConf</name>
  <description>This is the description</description>    
  <properties>
      <birduberportal.version>11</birduberportal.version>
      <promotiondevice.version>9</promotiondevice.version>
      <foobarportal.version>6</foobarportal.version>
      <eventuberdevice.version>2</eventuberdevice.version>
  </properties>
  <!-- A lot more here, but as it is irrelevant for the problem I have removed it -->
</project>

If I load the example.xml file above using ElementTree and print the root node:

>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('example.xml')
>>> print tree.getroot()
<Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>

I see that Element also contains the namespace http://maven.apache.org/POM/4.0.0.

How do I:

  1. Get the foobarportal.version text, increase it by one and write the XML file back while keeping the namespace the document had when loaded and also not change the overall XML layout.
  2. Get it to load using any namespace, not just http://maven.apache.org/POM/4.0.0. I still don´t want to strip the namespace, as I want the XML to stay the same except for changing foobarportal.version as in 1 above.

The current way is not aware of XML but fulfills 1 and 2 above:

  1. Grep for <foobarportal.version>(.*)</foobarportal.version>
  2. Take the contents of the match group and i increase it by one
  3. Write it back.

It would be nice to have an XML aware solution, as it would be more robust. The XML namespace handling of ElementTree is making it more complicated.

Upvotes: 0

Views: 1031

Answers (1)

Marcin
Marcin

Reputation: 49826

If your question is simply: "how do I search by a namespaced element name", then the answer is that lxml understands {namespace} syntax, so you can do:

tree.getroot().find('{http://maven.apache.org/POM/4.0.0}project')

Upvotes: 2

Related Questions