Reputation: 3260
I'm trying to read XML with ElementTree
and write the result back to disk. My long-term goal is to prettify the XML this way. However, in my naive approach, ElementTree eats all the namespace declarations in the document and I don't understand why. Here is an example
test.xsd
<?xml version='1.0' encoding='UTF-8'?>
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
xmlns='sdformat/pose' targetNamespace='sdformat/pose'
xmlns:pose='sdformat/pose'
xmlns:types='http://sdformat.org/schemas/types.xsd'>
<xs:import namespace='sdformat/pose' schemaLocation='./pose.xsd'/>
<xs:element name='pose' type='poseType' />
<xs:simpleType name='string'><xs:restriction base='xs:string' /></xs:simpleType>
<xs:simpleType name='pose'><xs:restriction base='types:pose' /></xs:simpleType>
<xs:complexType name='poseType'>
<xs:simpleContent>
<xs:extension base="pose">
<xs:attribute name='relative_to' type='string' use='optional' default=''>
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>
test.py
from xml.etree import ElementTree
ElementTree.register_namespace("types", "http://sdformat.org/schemas/types.xsd")
ElementTree.register_namespace("pose", "sdformat/pose")
ElementTree.register_namespace("xs", "http://www.w3.org/2001/XMLSchema")
tree = ElementTree.parse("test.xsd")
tree.write("test_out.xsd")
Produces test_out.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="sdformat/pose">
<xs:import namespace="sdformat/pose" schemaLocation="./pose.xsd" />
<xs:element name="pose" type="poseType" />
<xs:simpleType name="string"><xs:restriction base="xs:string" /></xs:simpleType>
<xs:simpleType name="pose"><xs:restriction base="types:pose" /></xs:simpleType>
<xs:complexType name="poseType">
<xs:simpleContent>
<xs:extension base="pose">
<xs:attribute name="relative_to" type="string" use="optional" default="">
</xs:attribute>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>
Notice how test_out.xsd is missing any namespace declarations from test.xsd. I would expect them to be identical. I verified that the latter is valid XML by validating it. It validates with exception of my choice of namespace URI, which I think shouldn't matter.
Update:
Based on mzji's comment I realized that this only happens for values of attributes. With this in mind, I can manually add the namespaces like so:
from xml.etree import ElementTree
namespaces = {
"types": "http://sdformat.org/schemas/types.xsd",
"pose": "sdformat/pose",
"xs": "http://www.w3.org/2001/XMLSchema"
}
for prefix, ns in namespaces.items():
ElementTree.register_namespace(prefix, ns)
tree = ElementTree.parse("test.xsd")
root = tree.getroot()
queue = [tree.getroot()]
while queue:
element:ElementTree.Element = queue.pop()
for value in element.attrib.values():
try:
prefix, value = value.split(":")
except ValueError:
# no namespace, nothing to do
pass
else:
if prefix == "xs":
break # ignore XMLSchema namespace
root.attrib[f"xmlns:{prefix}"] = namespaces[prefix]
for child in element:
queue.append(child)
tree.write("test_out.xsd")
While this solves the problem, it is quite an ugly solution. I also still don't understand why this happens in the first place, so it doesn't answer the question.
Upvotes: 1
Views: 440
Reputation: 2422
There is a valid reason for this behaviour, but it requires a good understanding of XML Schema concepts.
First, some important facts:
Based on the above facts, we can assert the following:
base
attribute will be treated as a string (technically, as PCDATA).base
attribute will be parsed as xs:QNameWhen ElementTree writes the output XML, its behaviour should depend on the data type of base
. If base
is a QName then ElementTree should detect that it is using the namespace prefix 'types' and it should emit the corresponding namespace declaration.
If you are not supplying the 'schema for schema' when parsing test.xsd then ElementTree is off the hook, because it cannot possibly know that base
is supposed to be interpreted as a QName.
Upvotes: 2