Reputation: 3912
I have a large number of XML file in which I want to make the following changes:
Create a new element – let's call it new_element
– under the root element
Find another, specific element, which is more deeply nested – let's call it existing_element
– and move it such that it becomes a child of new_element
I want to do this in the least intrusive way possible, such that the diff between the old file and the new file only shows changes on rows that either belong to the newly created element (and hence have been added to the file) or that used to belong to the element that was moved (and hence have been removed from the file). I want to do this using Python 3.
However, when I try just reading one of the XML files with xml.dom.minidom
and writing what I just read to a new file and diff the two files, every single line gets marked as changed (probably because they contain different types newlines). Besides, when I look at the contents of the two files, I see that the encoding specification in the XML declaration, as well as the newline after the declaration, have disappeared, and attributes in tags all throughout the document have had their order shuffled within the tag.
The story is very similar when using xml.etree.ElementTree
, only that now the entire XML declaration disappears, all tag names are preceded by "ns0:
" (for some reason), and some attribute names are followed by ":ns0
".
None of these "extra" modifications to the XML files are desirable, since I want to be able to diff the old file and the new file and be able to easily see what has changed and what has not.
So, is there some simple way to create a new XML file based on another XML file that only introduces changes to lines that really are supposed to be changed and leave all other lines untouched, and that doesn't involve writing your own code to parse the XML data?
Edit: Here is the structure of the file I want to process (since I don't know what will cause any suggested code to work or not to work with my XML files, I have only removed the data stored on three different places in the file – which I have replaced with "*data*
" – and kept everything else exactly as it is):
<?xml version="1.0" encoding="UTF-8"?>
<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
<asset>
<contributor/>
<created>2017-01-23T12:01:30Z</created>
<modified>2017-01-23T12:01:30Z</modified>
<unit/>
<up_axis>Z_UP</up_axis>
</asset>
<library_visual_scenes>
<visual_scene id="defaultScene">
<node id="sceneRoot">
<instance_geometry url="#geometry">
<bind_material>
<technique_common>
<instance_material symbol="geometry_material" target="#material">
<bind_vertex_input semantic="texcoord0" input_semantic="TEXCOORD" input_set="0"/>
</instance_material>
</technique_common>
</bind_material>
</instance_geometry>
</node>
</visual_scene>
</library_visual_scenes>
<library_geometries>
<geometry id="geometry">
<mesh>
<source id="geometry-positions">
<float_array id="geometry-positions-array" count="673731">*data*</float_array>
<technique_common>
<accessor count="224577" source="#geometry-positions-array" stride="3">
<param name="X" type="float"/>
<param name="Y" type="float"/>
<param name="Z" type="float"/>
</accessor>
</technique_common>
</source>
<source id="geometry-texcoord_0">
<float_array id="geometry-texcoord_0-array" count="449154">*data*</float_array>
<technique_common>
<accessor count="224577" source="#geometry-texcoord_0-array" stride="2">
<param name="S" type="float"/>
<param name="T" type="float"/>
</accessor>
</technique_common>
</source>
<vertices id="geometry-vertices">
<input semantic="POSITION" source="#geometry-positions"/>
</vertices>
<triangles count="329753" material="geometry_material">
<input offset="0" semantic="VERTEX" source="#geometry-vertices" set="0"/>
<input offset="1" semantic="TEXCOORD" source="#geometry-texcoord_0" set="0"/>
<p>*data*</p>
</triangles>
</mesh>
</geometry>
</library_geometries>
<library_materials>
<material id="material">
<instance_effect url="#material_effect"/>
</material>
</library_materials>
<library_effects>
<effect id="material_effect">
<profile_COMMON>
<image id="material_effect-image" height="0" width="0">
<init_from>Tile_+037_+047_0.jpg</init_from>
</image>
<newparam sid="material_effect-surface">
<surface type="2D">
<init_from>material_effect-image</init_from>
</surface>
</newparam>
<newparam sid="material_effect-sampler">
<sampler2D>
<source>material_effect-surface</source>
<wrap_s>CLAMP</wrap_s>
<wrap_t>CLAMP</wrap_t>
<minfilter>LINEAR_MIPMAP_LINEAR</minfilter>
<magfilter>LINEAR</magfilter>
<border_color>0 0 0 0</border_color>
</sampler2D>
</newparam>
<technique sid="t0">
<phong>
<emission>
<color>0 0 0 1</color>
</emission>
<ambient>
<color>1 1 1 1</color>
</ambient>
<diffuse>
<texture texture="material_effect-sampler" texcoord="texcoord0">
<extra type="color">
<technique profile="SCEI">
<color>1 1 1 1</color>
</technique>
</extra>
</texture>
</diffuse>
<specular>
<color>0 0 0 1</color>
</specular>
<shininess>
<float>0</float>
</shininess>
</phong>
</technique>
</profile_COMMON>
</effect>
</library_effects>
<scene>
<instance_visual_scene url="#defaultScene"/>
</scene>
</COLLADA>
This is what I want the XML to turn into:
<?xml version="1.0" encoding="UTF-8"?>
<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
<asset>
<contributor/>
<created>2017-01-23T12:01:30Z</created>
<modified>2017-01-23T12:01:30Z</modified>
<unit/>
<up_axis>Z_UP</up_axis>
</asset>
<library_visual_scenes>
<visual_scene id="defaultScene">
<node id="sceneRoot">
<instance_geometry url="#geometry">
<bind_material>
<technique_common>
<instance_material symbol="geometry_material" target="#material">
<bind_vertex_input semantic="texcoord0" input_semantic="TEXCOORD" input_set="0"/>
</instance_material>
</technique_common>
</bind_material>
</instance_geometry>
</node>
</visual_scene>
</library_visual_scenes>
<library_geometries>
<geometry id="geometry">
<mesh>
<source id="geometry-positions">
<float_array id="geometry-positions-array" count="673731">*data*</float_array>
<technique_common>
<accessor count="224577" source="#geometry-positions-array" stride="3">
<param name="X" type="float"/>
<param name="Y" type="float"/>
<param name="Z" type="float"/>
</accessor>
</technique_common>
</source>
<source id="geometry-texcoord_0">
<float_array id="geometry-texcoord_0-array" count="449154">*data*</float_array>
<technique_common>
<accessor count="224577" source="#geometry-texcoord_0-array" stride="2">
<param name="S" type="float"/>
<param name="T" type="float"/>
</accessor>
</technique_common>
</source>
<vertices id="geometry-vertices">
<input semantic="POSITION" source="#geometry-positions"/>
</vertices>
<triangles count="329753" material="geometry_material">
<input offset="0" semantic="VERTEX" source="#geometry-vertices" set="0"/>
<input offset="1" semantic="TEXCOORD" source="#geometry-texcoord_0" set="0"/>
<p>*data*</p>
</triangles>
</mesh>
</geometry>
</library_geometries>
<library_materials>
<material id="material">
<instance_effect url="#material_effect"/>
</material>
</library_materials>
<library_effects>
<effect id="material_effect">
<profile_COMMON>
<newparam sid="material_effect-surface">
<surface type="2D">
<init_from>material_effect-image</init_from>
</surface>
</newparam>
<newparam sid="material_effect-sampler">
<sampler2D>
<source>material_effect-surface</source>
<wrap_s>CLAMP</wrap_s>
<wrap_t>CLAMP</wrap_t>
<minfilter>LINEAR_MIPMAP_LINEAR</minfilter>
<magfilter>LINEAR</magfilter>
<border_color>0 0 0 0</border_color>
</sampler2D>
</newparam>
<technique sid="t0">
<phong>
<emission>
<color>0 0 0 1</color>
</emission>
<ambient>
<color>1 1 1 1</color>
</ambient>
<diffuse>
<texture texture="material_effect-sampler" texcoord="texcoord0">
<extra type="color">
<technique profile="SCEI">
<color>1 1 1 1</color>
</technique>
</extra>
</texture>
</diffuse>
<specular>
<color>0 0 0 1</color>
</specular>
<shininess>
<float>0</float>
</shininess>
</phong>
</technique>
</profile_COMMON>
</effect>
</library_effects>
<scene>
<instance_visual_scene url="#defaultScene"/>
</scene>
<library_images>
<image id="material_effect-image" height="0" width="0">
<init_from>Tile_+037_+047_0.jpg</init_from>
</image>
</library_images>
</COLLADA>
Note how in the second piece of XML, a tag named library_images
has been created under the root element (where under the root element is not important, as long as it is a direct child of it), and the element image
has been moved into it.
Upvotes: 1
Views: 331
Reputation: 107687
Yes, there is a way called XSLT, the special purpose language designed to transform XML files from one structure to another or other formats including HTML, TXT/CSV, even JSON. Python's third-party module, lxml
can run XSLT 1.0 scripts (not built-in minidom
or etree
modules). However, other languages (Java, C#, PHP, VB) and softwares (Saxon, Xalan, libxslt, .NET) can also run such scripts even 2.0 and 3.0 scripts. And Python can connect to these external solutions via command line calls.
Specifically, run the Identity Transform to preserve original format as is and then apply your needed changes to specific nodes. One challenge is handling of the default namespace requiring the definition of a prefix, doc, and namespace
when creating new element, library-images:
XSLT (save as .xsl)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://www.collada.org/2005/11/COLLADASchema">
<xsl:output omit-xml-declaration="no" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- ADD <new_element> AS CHILD TOO ROOT -->
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
<xsl:element name="library-images" namespace="http://www.collada.org/2005/11/COLLADASchema">
<xsl:copy-of select="descendant::doc:profile_COMMON/doc:image"/>
</xsl:element>
</xsl:copy>
</xsl:template>
<!-- REMOVE NODE IN DOCUMENT -->
<xsl:template match="doc:profile_COMMON/doc:image"/>
</xsl:stylesheet>
Python
import lxml.etree as et
# LOAD XML AND XSL
doc = et.parse('my_file.xml')
xsl = et.parse('my_script.xsl')
# CONFIGURE TRANSFORMER
transform = et.XSLT(xsl)
# RUN TRANSFORMATION WITH PARAMS
result = transform(doc)
# PRINT RESULT
print(result)
# SAVE TO FILE
with open('output.xml', 'wb') as f:
f.write(result)
Output
<?xml version="1.0" encoding="UTF-8"?>
<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
<asset>
<contributor/>
<created>2017-01-23T12:01:30Z</created>
<modified>2017-01-23T12:01:30Z</modified>
<unit/>
<up_axis>Z_UP</up_axis>
</asset>
<library_visual_scenes>
<visual_scene id="defaultScene">
<node id="sceneRoot">
<instance_geometry url="#geometry">
<bind_material>
<technique_common>
<instance_material symbol="geometry_material" target="#material">
<bind_vertex_input semantic="texcoord0" input_semantic="TEXCOORD" input_set="0"/>
</instance_material>
</technique_common>
</bind_material>
</instance_geometry>
</node>
</visual_scene>
</library_visual_scenes>
<library_geometries>
<geometry id="geometry">
<mesh>
<source id="geometry-positions">
<float_array id="geometry-positions-array" count="673731">*data*</float_array>
<technique_common>
<accessor count="224577" source="#geometry-positions-array" stride="3">
<param name="X" type="float"/>
<param name="Y" type="float"/>
<param name="Z" type="float"/>
</accessor>
</technique_common>
</source>
<source id="geometry-texcoord_0">
<float_array id="geometry-texcoord_0-array" count="449154">*data*</float_array>
<technique_common>
<accessor count="224577" source="#geometry-texcoord_0-array" stride="2">
<param name="S" type="float"/>
<param name="T" type="float"/>
</accessor>
</technique_common>
</source>
<vertices id="geometry-vertices">
<input semantic="POSITION" source="#geometry-positions"/>
</vertices>
<triangles count="329753" material="geometry_material">
<input offset="0" semantic="VERTEX" source="#geometry-vertices" set="0"/>
<input offset="1" semantic="TEXCOORD" source="#geometry-texcoord_0" set="0"/>
<p>*data*</p>
</triangles>
</mesh>
</geometry>
</library_geometries>
<library_materials>
<material id="material">
<instance_effect url="#material_effect"/>
</material>
</library_materials>
<library_effects>
<effect id="material_effect">
<profile_COMMON>
<newparam sid="material_effect-surface">
<surface type="2D">
<init_from>material_effect-image</init_from>
</surface>
</newparam>
<newparam sid="material_effect-sampler">
<sampler2D>
<source>material_effect-surface</source>
<wrap_s>CLAMP</wrap_s>
<wrap_t>CLAMP</wrap_t>
<minfilter>LINEAR_MIPMAP_LINEAR</minfilter>
<magfilter>LINEAR</magfilter>
<border_color>0 0 0 0</border_color>
</sampler2D>
</newparam>
<technique sid="t0">
<phong>
<emission>
<color>0 0 0 1</color>
</emission>
<ambient>
<color>1 1 1 1</color>
</ambient>
<diffuse>
<texture texture="material_effect-sampler" texcoord="texcoord0">
<extra type="color">
<technique profile="SCEI">
<color>1 1 1 1</color>
</technique>
</extra>
</texture>
</diffuse>
<specular>
<color>0 0 0 1</color>
</specular>
<shininess>
<float>0</float>
</shininess>
</phong>
</technique>
</profile_COMMON>
</effect>
</library_effects>
<scene>
<instance_visual_scene url="#defaultScene"/>
</scene>
<library-images>
<image id="material_effect-image" height="0" width="0">
<init_from>Tile_+037_+047_0.jpg</init_from>
</image>
</library-images>
</COLLADA>
Upvotes: 1