g t
g t

Reputation: 21

Merge XML files while ignoring duplicate elements

I have 2 XML files that I want to merge, but I do not want to change any of the existing elements from the original file. What is the best way to do this on a linux system?

Note: there are posts about using XSLT that seem to be close to what I need, but I do not have an XSLT processor installed (nor do I have rights to install it). That said, I do have xsltproc installed, but I'm not sure that this will help. If xsltproc would help, please provide a suitable command line example.

Here is snippet of the original file:

<?xml version="1.0" encoding="utf-8"?>
<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
   <Comment>This file was automatically generated.</Comment>
   <FieldAttrs>
      <Name>FieldAttrsAll</Name>
      <Field>
         <Name>wLegExchInstIds</Name>
         <Fid>6203</Fid>
         <Type>StringVector</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>

      <Field>
         <Name>wPartitionId</Name>
         <Fid>5886</Fid>
         <Type>Integer</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>
   </FieldAttrs>
</config>

And here is the new file I need to merge:

<?xml version="1.0" encoding="utf-8"?>
<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
   <Comment>This file was automatically generated.</Comment>
   <FieldAttrs>
      <Name>FieldAttrsAll</Name>
      <Field>
         <Name>wLegExchInstIds</Name>
         <Fid>6203</Fid>
         <Type>StringVector</Type>
         <CheckModified>false</CheckModified>
         <PublishField>false</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>    
      <Field>
         <Name>wPartitionId</Name>
         <Fid>5886</Fid>
         <Type>Integer</Type>
         <CheckModified>false</CheckModified>
         <PublishField>false</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>    
      <Field>
         <Name>wUnverifiedPriceIndicator</Name>
         <Fid>5885</Fid>
         <Type>Bool</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>true</ClearDaily>
      </Field>
      <Field>
         <Name>wCorrIsIrregular</Name>
         <Fid>5884</Fid>
         <Type>Bool</Type>
         <CheckModified>false</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>true</ClearDaily>
      </Field>

   </FieldAttrs>
</config>

In particular note 2 things:

  1. the existing values of some of the elements changed in the new file, and
  2. there are new elements added in the new file.

Given the above files, I want the output to look as follows:

<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
  <Comment>This file was automatically generated.</Comment>
   <FieldAttrs>
      <Name>FieldAttrsAll</Name>
      <Field>
         <Name>wLegExchInstIds</Name>
         <Fid>6203</Fid>
         <Type>StringVector</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>

      <Field>
         <Name>wPartitionId</Name>
         <Fid>5886</Fid>
         <Type>Integer</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>false</ClearDaily>
      </Field>

      <Field>
         <Name>wUnverifiedPriceIndicator</Name>
         <Fid>5885</Fid>
         <Type>Bool</Type>
         <CheckModified>true</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>true</ClearDaily>
      </Field>
      <Field>
         <Name>wCorrIsIrregular</Name>
         <Fid>5884</Fid>
         <Type>Bool</Type>
         <CheckModified>false</CheckModified>
         <PublishField>true</PublishField>
         <ClearDaily>true</ClearDaily>
      </Field>    
   </FieldAttrs>
</config>

Upvotes: 0

Views: 736

Answers (2)

Parfait
Parfait

Reputation: 107622

Consider the following XSLT that uses the document() function to parse from external XML. This approach actually begins with the larger XML file parsing values from the shorter XML to remove duplicates as opposed to add distinct nodes:

XSLT (save as .xsl file, references second XML file to be saved in same directory as first one)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

 <!-- Identity Transform -->
 <xsl:template match="@*|node()">
   <xsl:copy>
     <xsl:apply-templates select="@*|node()"/>      
   </xsl:copy>
 </xsl:template>  

 <xsl:template match="FieldAttrs">
   <xsl:copy>
     <xsl:copy-of select="Name"/>
     <xsl:copy-of select="document('ShorterXML.xml')/config/FieldAttrs/Field"/>
     <xsl:apply-templates/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="Field[Name=document('ShorterXML.xml')/config/FieldAttrs/Field/Name]"/>

</xsl:transform>

Linux command line (references only one of the XML files as input all in same directory)

xsltproc transform.xsl LongerXML.xml -o output.xml

Output

<?xml version="1.0" encoding="UTF-8"?>
<config xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
  <Comment>This file was automatically generated.</Comment>
  <FieldAttrs>
    <Name>FieldAttrsAll</Name>
    <Field>
      <Name>wLegExchInstIds</Name>
      <Fid>6203</Fid>
      <Type>StringVector</Type>
      <CheckModified>true</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>false</ClearDaily>
    </Field>
    <Field>
      <Name>wPartitionId</Name>
      <Fid>5886</Fid>
      <Type>Integer</Type>
      <CheckModified>true</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>false</ClearDaily>
    </Field>
    <Name>FieldAttrsAll</Name>
    <Field>
      <Name>wUnverifiedPriceIndicator</Name>
      <Fid>5885</Fid>
      <Type>Bool</Type>
      <CheckModified>true</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>true</ClearDaily>
    </Field>
    <Field>
      <Name>wCorrIsIrregular</Name>
      <Fid>5884</Fid>
      <Type>Bool</Type>
      <CheckModified>false</CheckModified>
      <PublishField>true</PublishField>
      <ClearDaily>true</ClearDaily>
    </Field>
  </FieldAttrs>
</config>

Upvotes: 1

choroba
choroba

Reputation: 241868

I was able to merge the two files in the given way using xsh, a wrapper around XML::LibXML that uses libxml2 under the hood:

my $old := open old.xml ;
$field := hash Name //Field ;

open new.xml ;
for //Field {
    $exists = xsh:lookup('field', Name) ;
    if not($exists)
        copy . into $old/config/FieldAttrs ;
}

save :f merged.xml $old ;

Upvotes: 0

Related Questions