Reputation: 12152
I use the xml
library in Python3.5 for reading and writing an xml-file. I don't modify the file. Just open and write. But the library modifes the file.
This is the example file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
<title>Der Eisbär</title>
<ids>
<entry>
<key>tmdb</key>
<value xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">9321</value>
</entry>
<entry>
<key>imdb</key>
<value xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">tt0167132</value>
</entry>
</ids>
</movie>
This is the code
import xml.etree.ElementTree as ET
tree = ET.parse('x.nfo')
tree.write('y.nfo', encoding='utf-8')
And the xml-file becomes this
<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<title>Der Eisbär</title>
<ids>
<entry>
<key>tmdb</key>
<value xsi:type="xs:int">9321</value>
</entry>
<entry>
<key>imdb</key>
<value xsi:type="xs:string">tt0167132</value>
</entry>
</ids>
</movie>
<movie>
-tag in line 2 has attributes now.<value>
-tag in line 7 and 11 now has less attributes.Upvotes: 7
Views: 1218
Reputation: 50947
Note that "xml package" and "the xml
library" are ambiguous. There are several XML-related modules in the standard library: https://docs.python.org/3/library/xml.html.
Why is it modified?
ElementTree moves namespace declarations to the root element and declarations for namespaces that aren't actually used in the document are removed.
Why does ElementTree do this? I don't know, but perhaps it is a way to make the implementation simpler.
How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.
I don't think there is a way to prevent this. The issue has been brought up before. Here are two very similar questions:
My suggestion is to use lxml instead of ElementTree. With lxml, the namespace declarations will remain where they occur in the original file.
Line 1 is gone.
That line is the XML declaration. It is recommended but not mandatory to have one.
If you always want an XML declaration, use xml_declaration=True
in the write()
method call.
Upvotes: 6