Seshagiri Lekkala
Seshagiri Lekkala

Reputation: 53

How can I concatenate multiple XML files?

How can I concatenate multiple XML files from different directories into a single XML file using Perl?

Upvotes: 1

Views: 1307

Answers (1)

Tim
Tim

Reputation: 9269

I've had to make quite a lot of assumptions to do this, but here's my answer:

#!/usr/bin/perl -w

use strict;
use XML::LibXML;

my $output_doc = XML::LibXML->load_xml( string => <<EOF);
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects xml:id='total'/>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
</issu-meta> 

EOF

my $object_count = 0;

foreach (@ARGV) {
  my $input_doc = XML::LibXML->load_xml( location => $_ );
  foreach ($input_doc->findnodes('/*[local-name()="issu-meta"]/*[local-name()="basictype"]')) {  # find each object
    my $object = $output_doc->importNode($_, 1);  # import the object information into the output document
    $output_doc->documentElement->appendChild($object);  # append the new XML nodes to the output document root
    $object_count++;  # keep track of how many objects we've seen
  }
}

my $total = $output_doc->getElementById('total');  # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count));  # append the object count to that element
$total->removeAttribute('xml:id');  # remove the XML id, as it's not wanted in the output

print $output_doc->toString;  # output the final document

Firstly, the <comp> element seems to come from nowhere, so I've had to ignore that. I'm also assuming that the required output content before each of the <basictype> elements is always going to be the same, except for the object count.

So I build an empty output document to start with, and then iterate over each filename provided on the commandline. For each, I find each object and copy it into the output file. Once I've done all the input files, I insert the object count.

It's made more difficult by the use of xmlns on the files. This makes the XPath search expression more complicated than it needs to be. If possible, I'd be tempted to remove the xmlns attributes and you'd be left with:

foreach ($input_doc->findnodes('/issu-meta/basictype')) {

which is a lot simpler.

So, when I run this:

perl combine abc/a.xml xyz/b.xml

I get:

<?xml version="1.0"?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects>3</num-objects>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
<basictype>
       <id> 1 </id>
       <name> pointer </name>
       <pointer/>
       <size> 64 </size>
</basictype><basictype>
     <id> 4 </id>
     <name> int32_t </name>
     <primitive/>
     <size> 32 </size>
 </basictype><basictype>
      <id> 2 </id>
      <name> int8_t </name>
      <primitive/>
      <size> 8 </size>
</basictype></issu-meta>

which is pretty close to what you're after.

Edit: OK, my answer now looks like this:

#!/usr/bin/perl -w

use strict;
use XML::LibXML qw( :libxml );  # load LibXML support and include node type definitions

my $output_doc = XML::LibXML->load_xml( string => <<EOF);  # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects xml:id='total'/>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
</issu-meta> 

EOF

my $object_count = 0;

foreach (@ARGV) {
  my $input_doc = XML::LibXML->load_xml( location => $_ );

  my $import_started = 0;
  foreach ($input_doc->documentElement->childNodes) {
    next unless $_->nodeType == XML_ELEMENT_NODE;  # if it's not an element, ignore it

    if ($_->localName eq 'compatibility') {  # if it's the "compatibility" element, ...
      $import_started = 1;  # ... switch on importing ...
      next;  # ... and move to the next child of the root
    }

    next unless $import_started;  # if we've not started importing, and it's
                                  #   not the "compatibility" element, simply
                                  #   ignore it and move on

    my $object = $output_doc->importNode($_, 1);  # import the object information into the output document
    $output_doc->documentElement->appendChild($object);  # append the new XML nodes to the output document root
    $object_count++;  # keep track of how many objects we've seen
  }
}

my $total = $output_doc->getElementById('total');  # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count));  # append the object count to that element
$total->removeAttribute('xml:id');  # remove the XML id, as it's not wanted in the output

print $output_doc->toString;  # output the final document

which simply imports each element which is a child of the root <issu-meta> document element after the first <compatibility> element it finds, and, as before, updates the object count. If I've understood your requirement that should do you.

If it works, I strongly suggest you work through both this answer and my earlier one to ensure you understant why it works for your problem. There are lots of useful technologies used in here, and once you understand it, you will have learned a lot about some of the ways you can manipulate XML. Any problems, ask a new question on this site. Have fun!

Edit #2: Right, this should be the last piece you need:

#!/usr/bin/perl -w

use strict;
use XML::LibXML qw( :libxml );  # load LibXML support and include node type definitions

my @input_files = (
                    'abc/a.xml',
                    'xyz/b.xml',
                  );
my $output_file = 'output.xml';

my $output_doc = XML::LibXML->load_xml( string => <<EOF);  # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects xml:id='total'/>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
</issu-meta> 

EOF

my $object_count = 0;

foreach (@input_files) {
  my $input_doc = XML::LibXML->load_xml( location => $_ );

  my $import_started = 0;
  foreach ($input_doc->documentElement->childNodes) {
    next unless $_->nodeType == XML_ELEMENT_NODE;  # if it's not an element, ignore it

    if ($_->localName eq 'compatibility') {  # if it's the "compatibility" element, ...
      $import_started = 1;  # ... switch on importing ...
      next;  # ... and move to the next child of the root
    }

    next unless $import_started;  # if we've not started importing, and it's
                                  #   not the "compatibility" element, simply
                                  #   ignore it and move on

    my $object = $output_doc->importNode($_, 1);  # import the object information into the output document
    $output_doc->documentElement->appendChild($object);  # append the new XML nodes to the output document root
    $object_count++;  # keep track of how many objects we've seen
  }
}

my $total = $output_doc->getElementById('total');  # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count));  # append the object count to that element
$total->removeAttribute('xml:id');  # remove the XML id, as it's not wanted in the output

$output_doc->toFile($output_file, 1);  # output the final document

After running like this: perl combine the file output.xml is created, with the following contents:

<?xml version="1.0"?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects>7</num-objects>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
<basictype>
       <id> 1 </id>
       <name> pointer </name>
       <pointer/>
       <size> 64 </size>
</basictype><basictype>
     <id> 4 </id>
     <name> int32_t </name>
     <primitive/>
     <size> 32 </size>
 </basictype><enum>
      <id>1835009 </id>
      <name> chkpt_state_t </name>
      <label>
           <name> CHKP_STATE_PENDING </name>
      <value> 1 </value>
      </label>
  </enum><struct>
         <id> 1835010 </id>
          <name> _ipcEndpoint </name>
          <size> 64 </size>
          <elem>
              <id> 0 </id>
              <name> ep_addr </name>
              <type> uint32_t </type>
              <type-id> 8 </type-id>
              <size> 32 </size>
             <offset> 0 </offset>
         </elem>
   </struct><basictype>
      <id> 2 </id>
      <name> int8_t </name>
      <primitive/>
      <size> 8 </size>
</basictype><alias>
     <id> 1835012 </id>
     <name> Endpoint </name>
     <size> 64 </size>
     <type> _ipcEndpoint </type>
     <type-id> 1835010 </type-id>
</alias><bitmask>
      <id> 1835015 </id>
      <name> ipc_flag_t </name>
      <size> 8 </size>
      <type> uint8_t </type>
      <type-id> 6 </type-id>
      <label>
           <name> IPC_APPLICATION_REGISTER_MSG </name>
           <value> 1 </value>
      </label>
 </bitmask></issu-meta>

Last tip: although it makes almost no difference to the XML, it's a little more human-readable once it's been run through xmltidy:

<?xml version="1.0"?>
<issu-meta xmlns="ver2">
  <metadescription>
    <num-objects>7</num-objects>
  </metadescription>
  <compatibility>
    <baseline> 6.2.1.2.43 </baseline>
  </compatibility>
  <basictype>
    <id> 1 </id>
    <name> pointer </name>
    <pointer/>
    <size> 64 </size>
  </basictype>
  <basictype>
    <id> 4 </id>
    <name> int32_t </name>
    <primitive/>
    <size> 32 </size>
  </basictype>
  <enum>
    <id>1835009 </id>
    <name> chkpt_state_t </name>
    <label>
      <name> CHKP_STATE_PENDING </name>
      <value> 1 </value>
    </label>
  </enum>
  <struct>
    <id> 1835010 </id>
    <name> _ipcEndpoint </name>
    <size> 64 </size>
    <elem>
      <id> 0 </id>
      <name> ep_addr </name>
      <type> uint32_t </type>
      <type-id> 8 </type-id>
      <size> 32 </size>
      <offset> 0 </offset>
    </elem>
  </struct>
  <basictype>
    <id> 2 </id>
    <name> int8_t </name>
    <primitive/>
    <size> 8 </size>
  </basictype>
  <alias>
    <id> 1835012 </id>
    <name> Endpoint </name>
    <size> 64 </size>
    <type> _ipcEndpoint </type>
    <type-id> 1835010 </type-id>
  </alias>
  <bitmask>
    <id> 1835015 </id>
    <name> ipc_flag_t </name>
    <size> 8 </size>
    <type> uint8_t </type>
    <type-id> 6 </type-id>
    <label>
      <name> IPC_APPLICATION_REGISTER_MSG </name>
      <value> 1 </value>
    </label>
  </bitmask>
</issu-meta>

Good luck working through this and taking it further. Do come back to this site to ask more questions when they come up!

Upvotes: 1

Related Questions