FirefoxMetzger
FirefoxMetzger

Reputation: 3260

Why does ElementTree eat/ignore namespaces (in attribute values)?

I'm trying to read XML with ElementTree and write the result back to disk. My long-term goal is to prettify the XML this way. However, in my naive approach, ElementTree eats all the namespace declarations in the document and I don't understand why. Here is an example

test.xsd

<?xml version='1.0' encoding='UTF-8'?>
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
    xmlns='sdformat/pose' targetNamespace='sdformat/pose'
    xmlns:pose='sdformat/pose'
    xmlns:types='http://sdformat.org/schemas/types.xsd'>

<xs:import namespace='sdformat/pose' schemaLocation='./pose.xsd'/>

<xs:element name='pose' type='poseType' />

<xs:simpleType name='string'><xs:restriction base='xs:string' /></xs:simpleType>
<xs:simpleType name='pose'><xs:restriction base='types:pose' /></xs:simpleType>

<xs:complexType name='poseType'>
    <xs:simpleContent>
      <xs:extension base="pose">
    <xs:attribute name='relative_to' type='string' use='optional' default=''>
    </xs:attribute>

      </xs:extension>
    </xs:simpleContent>
</xs:complexType>


</xs:schema>

test.py

from xml.etree import ElementTree

ElementTree.register_namespace("types", "http://sdformat.org/schemas/types.xsd")
ElementTree.register_namespace("pose", "sdformat/pose")
ElementTree.register_namespace("xs", "http://www.w3.org/2001/XMLSchema")

tree = ElementTree.parse("test.xsd")
tree.write("test_out.xsd")

Produces test_out.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="sdformat/pose">

<xs:import namespace="sdformat/pose" schemaLocation="./pose.xsd" />

<xs:element name="pose" type="poseType" />

<xs:simpleType name="string"><xs:restriction base="xs:string" /></xs:simpleType>
<xs:simpleType name="pose"><xs:restriction base="types:pose" /></xs:simpleType>

<xs:complexType name="poseType">
    <xs:simpleContent>
      <xs:extension base="pose">
    <xs:attribute name="relative_to" type="string" use="optional" default="">
    </xs:attribute>

      </xs:extension>
    </xs:simpleContent>
</xs:complexType>


</xs:schema>

Notice how test_out.xsd is missing any namespace declarations from test.xsd. I would expect them to be identical. I verified that the latter is valid XML by validating it. It validates with exception of my choice of namespace URI, which I think shouldn't matter.


Update:

Based on mzji's comment I realized that this only happens for values of attributes. With this in mind, I can manually add the namespaces like so:

from xml.etree import ElementTree

namespaces = {
    "types": "http://sdformat.org/schemas/types.xsd",
    "pose": "sdformat/pose",
    "xs": "http://www.w3.org/2001/XMLSchema"
}

for prefix, ns in namespaces.items():
    ElementTree.register_namespace(prefix, ns)

tree = ElementTree.parse("test.xsd")
root = tree.getroot()

queue = [tree.getroot()]
while queue:
    element:ElementTree.Element = queue.pop()
    for value in element.attrib.values():
        try:
            prefix, value = value.split(":")
        except ValueError:
            # no namespace, nothing to do
            pass
        else:
            if prefix == "xs":
                break  # ignore XMLSchema namespace
            root.attrib[f"xmlns:{prefix}"] = namespaces[prefix]

    for child in element:
        queue.append(child)

tree.write("test_out.xsd")

While this solves the problem, it is quite an ugly solution. I also still don't understand why this happens in the first place, so it doesn't answer the question.

Upvotes: 1

Views: 440

Answers (1)

kimbert
kimbert

Reputation: 2422

There is a valid reason for this behaviour, but it requires a good understanding of XML Schema concepts.

First, some important facts:

  • Your XML document is not just any old XML document. It is an XSD.
  • An XSD is described by a schema (See schema for schema )
  • The attribute xs:restriction/@base is not an xs:string. Its type is xs:QName.

Based on the above facts, we can assert the following:

  • if test.xsd is parsed as an XML document, but without knowledge of the 'schema for schema' then the value of the base attribute will be treated as a string (technically, as PCDATA).
  • if test.xsd is parsed using a validating XML parser, with the 'schema for schema' as the XSD, then the value of the base attribute will be parsed as xs:QName

When ElementTree writes the output XML, its behaviour should depend on the data type of base. If base is a QName then ElementTree should detect that it is using the namespace prefix 'types' and it should emit the corresponding namespace declaration.

If you are not supplying the 'schema for schema' when parsing test.xsd then ElementTree is off the hook, because it cannot possibly know that base is supposed to be interpreted as a QName.

Upvotes: 2

Related Questions