user2250246
user2250246

Reputation: 3967

How to diff 2 xml files and apply the patch to a third xml file?

Let us assume there is a default version 1 XML file:

<!--
  Specification for a shirt
-->
<shirt color="red" size="L">
  <buttons count=20/>
  <pocket position="left">
    <!-- this might be removed later on -->
  </pocket>
</shirt>

When deployed in production, this is changed (for some reason we don't care about) as follows (Let us call it 1A):

<!-- Specification for a shirt -->
<shirt size="M" color="blue">
  <buttons count=16/>
  <pocket position="left">
    <!-- this might be removed later on -->
  </pocket>
</shirt>

Now a new version of the XML version 2 is released as a new default:

<!--
  Specification for a shirt
-->
<shirt color="red" size="L" vendor="xyz">
  <buttons count=16/>
  <cloth type="silk"/>
</shirt>

Now all those 1A files in production need to be changed.

Question is how do we find the differences between the first two XML files and patch those into the third XML file? Not that the files in production could have changes in the ordering of XML attributes which does not count as a semantic change and should be ignored while computing the diff. Similarly line breaks between two attributes should be ignored. Example:

<shirt color="red" size="L">

should be considered equivalent to:

<shirt size="L"
    color="red">

I know we can write a Java program to do this, but if there is a nifty utility like xmldiff, then that would be awesome because the real XML files are thousand lines big and there are hundreds of such files with many variants running in production.

Secondly it would be great if the comments found in version 2 were conserved while doing the diff/patch

Upvotes: 1

Views: 909

Answers (1)

alex
alex

Reputation: 955

first, as a matter o design, i would rather do the changes at another level, in the product definition tree, and not at the xml output level.

i tried xmldiff and xmlpatch on a debian linux system, that visibly applied on version 2 the changes from version 1 to 1a, even if the input had new lines inside. first, your files should be correct xml, the count attribute should be in quotes. i did it manually, but you may use some program, like beautifulsoup in python.

here's what i did. hopefully to your help. it won't be hard to further automate this to run on files's collections, as these python programs are open shource.

xmldiff ver1.xml ver1a.xml >ver1-diff
xmlpatch ver1-diff ver2.xml >ver2a.xml

Upvotes: 0

Related Questions