Srini
Srini

Reputation: 63

Split XML based on Tag values

Hi I have an XML and I want that to be split into multiple XML's based on a tag value inside it.

Example:-

<HEADER>
<ROOT>
<TAG1>ABC</TAG1> 
<TAG2>78011DAC8</TAG2> 
<TAG3>US78011DAC83</TAG3> 
</ROOT>
<ROOT>
<TAG1>ABC</TAG1> 
<TAG2>78011DAD6</TAG2> 
<TAG3>US78011DAD66</TAG3> 
</ROOT>
<ROOT>
<TAG1>ABC</TAG1> 
<TAG2>B06983611</TAG2> 
<TAG3>GB0009075325</TAG3> 
</ROOT>
<ROOT>
<TAG1>ABC</TAG1> 
<TAG2>B06983629</TAG2> 
<TAG3>GB0009081828</TAG3> 
</ROOT>
<ROOT>
<TAG1>ABC</TAG1> 
<TAG2>BRS038D62</TAG2> 
<TAG3>FR0010050559</TAG3> 
</ROOT>
<ROOT>
<TAG1>ABC</TAG1> 
<TAG2>BRS49ESZ5</TAG2> 
<TAG3>GB00B1Z5HQ14</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>B06983637</TAG2> 
<TAG3>GB0008983024</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>BRS26Y2R4</TAG2> 
<TAG3>GB00B128DH60</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>BRS1JW2X3</TAG2> 
<TAG3>FR0010235176</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>BRS1JW2Y1</TAG2> 
<TAG3>GB00B0CNHZ09</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>BRS3BP9P2</TAG2> 
<TAG3>GB00B1L6W962</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>BRS7FFAV6</TAG2> 
<TAG3>GB00B3D4VD98</TAG3> 
</ROOT> 
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>B0A07E1X7</TAG2> 
<TAG3>GB0031790826</TAG3> 
</ROOT>
<ROOT>
<TAG1>DEF</TAG1> 
<TAG2>BRS1Z0T57</TAG2> 
<TAG3>GB00B0V3WQ75</TAG3> 
</ROOT>
<ROOT>
<TAG1>XYZ</TAG1> 
<TAG2>BRS9ZDYJ6</TAG2> 
<TAG3>FR0010899765</TAG3> 
</ROOT>
<ROOT>
<TAG1>XYZ</TAG1> 
<TAG2>BRS8ANE14</TAG2> 
<TAG3>DE0001030526</TAG3> 
</ROOT>
<ROOT>
<TAG1>XYZ</TAG1> 
<TAG2>BRS22TXL8</TAG2> 
<TAG3>DE0001030500</TAG3> 
</ROOT>
<ROOT>
<TAG1>XYZ</TAG1> 
<TAG2>BRS5LHPB7</TAG2> 
<TAG3>GB00B24FFM16</TAG3> 
</ROOT>
<ROOT>
<TAG1>XYZ</TAG1> 
<TAG2>B06983223</TAG2> 
<TAG3>GB0008932666</TAG3> 
</ROOT>
</HEADER>

In the above example, i need to check for TAG1 value and if it matches with the next TAG1 value it should not split and if doesn't matches then it should split into a new XML file...

Appreciate your help !!!

Upvotes: 2

Views: 717

Answers (3)

Srini
Srini

Reputation: 63

Atlast i found the fix .. below is the code which will check for both the count and TAG values ....

#!/usr/bin/perl

use strict;
use warnings;

use autodie qw( open);

use XML::Twig;

my $in_file = $ARGV[0];

my $out_file= "$in_file.p";
my $i="01";
my $current_tag1='';
my $previous_tag1 = '';
my $nb_root_in_file  =0;
my $MIN_ROOT_IN_FILE = 5;


my $twig=XML::Twig->new(   
twig_handlers => { 
   ROOT => sub { my( $t, $root)= @_;
   $current_tag1||= $root->field( 'TAG1');      # initialize current tag if needed
   $nb_root_in_file++;
   if( $nb_root_in_file > $MIN_ROOT_IN_FILE && $root->field( 'TAG1') ne $current_tag1)  # found a break in the value of TAG1 
                   { 
                     $root->cut;                   # get the new root out of the way
                     $t->print_to_file( $out_file. $i++);     # output the part
                     $t->purge;                       # remove the content of the part
              $root->paste( first_child => $t->root);  # put the new root back in place
                     $current_tag1=  $root->field( 'TAG1'); 
                     $nb_root_in_file =0;
                   }
                    $previous_tag1 = $current_tag1;
               }
 },
keep_spaces => 1, # to keep line returns
 );

 $twig->parsefile($in_file);
  $twig->print_to_file( $out_file . $i); # output the last part

Upvotes: 2

mirod
mirod

Reputation: 16161

Here is a relatively simple way to do this using XML::Twig. The maximum size kept in memory is a whole sub-file, in case that's important (it would be possible to do better, keeping at most 1 ROOT in memory).

#!/usr/bin/perl

use strict;
use warnings;

use autodie qw( open);

use XML::Twig;

my $in_file = $ARGV[0];

my $out_file= "$in_file.p";
my $i="01";
my $current_tag1='';


my $twig=XML::Twig->new(   
    twig_handlers => { 
       ROOT => sub { my( $t, $root)= @_;
                     $current_tag1||= $root->field( 'TAG1');      # initialize current tag if needed

                     if( $root->field( 'TAG1') ne $current_tag1)  # found a break in the value of TAG1 
                       { 
                         $root->cut;                              # get the new root out of the way
                         $t->print_to_file( $out_file. $i++);     # output the part
                         $t->purge;                               # remove the content of the part
                         $root->paste( first_child => $t->root);  # put the new root back in place

                         $current_tag1=  $root->field( 'TAG1'); 
                       }
                   }
    },
    keep_spaces => 1, # to keep line returns
);

$twig->parsefile($in_file);
$twig->print_to_file( $out_file . $i); # output the last part

Upvotes: 2

user1558455
user1558455

Reputation:

maybe you can parse it with

use XML::Simple;

my $xml = XMLin($your_xml);

and then something like

if ($xml->{HEADER}->[0]->{ROOT}->{TAG1} == $xml->{HEADER}->[1]->{ROOT}->{TAG1}) { ... }

i actually dont know the outcoming xml struc

Upvotes: 0

Related Questions