maddy
maddy

Reputation: 83

how to split xml using xpath in perl?

i have a input xml which i have to split based on doc and delt wise and save it in this format delt_0001.xml

This is my code

    #!/usr/bin/perl
    use XML::XPath;

    my $file = 'file.xml';

    my $xp = XML::XPath->new(filename=>$file);

     foreach my $entry ( $xp->findnodes('/xml/service/main/doc') ) {
       my $filename = $entry->findvalue('./delt/@id');
      foreach my $entry1( $entry->findnodes('//delt')){

     my $filename = $entry1->findvalue('/delt/@id');
         my $content  = $entry1->toString;
    open(wr,">delt_$filename.xml");
    print wr "$content\n";
    close wr;

    }

When i run the program all delt portion prints in one xml.

input xml delt.xml

  <xml>
<service>
<title>split xml</title>
<main>
<doc id="001">
<title>doc1</title>
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
<doc id="002">
<title>doc2</title>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>

output am geting

         <delt id="0001">
        <title>delt1</title>
        <text>num1</text>``
        <text>num1</text>
        </delt>
        <delt id="0002-A">
        <title>delt1</title>
        <text>num1</text>
        <text>num1</text>
        </delt>
       <delt id="0003">
        <title>delt1</title>
        <text>num1</text>
        <text>num1</text>
        </delt>
        <delt id="0004">
        <title>delt1</title>
        <text>num1</text>
        <text>num1</text>
        </delt>

output needed

split no 1 delt_0001.xml

<xml>
<service>
<title>split xml</title>
<main>
<doc id=001>
<title>doc1</title>
<delt id=0001>
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>

split no 2 delt_0002-A.xml

<xml>
<service>
<title>split xml</title>
<main>
<doc id=001>
<title>doc1</title>
<delt id=0002=A>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>

split no 3 delt_0003.xml

<xml>
<service>
<title>split xml</title>
<main>
<doc id=002>
<title>doc2</title>
<delt id=0003>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>

split no 4 delt_0004.xml

<xml>
<service>
<title>split xml</title>
<main>
<doc id=002>
<title>doc2</title>    
<delt id=0004>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
<delt>
</doc>
</main>
</service>
</xml>

Thanks in advance

Upvotes: 2

Views: 188

Answers (2)

Sobrique
Sobrique

Reputation: 53478

The reason you're having difficult is because what you're doing is extracting a subset from an XML doc, but then trying to also include some of the stuff from the 'parent'.

Pulling your 'delts' out would be fairly straightforward

I would be wanting to use XML::Twig with this - this is a perfect place to use a twig handler.

I'd be thinking something along the lines of (and apologies, this doesn't quite work yet).

use strict;
use warnings;
use XML::Twig;

sub process_delt {
    my ( $twig, $delt ) = @_;
    my $delt_id = $delt->att('id');
    print "\nID:\n$delt_id\n";
    my $filename = "$delt_id.xml";


    $delt->set_pretty_print('indented');
    $delt->print;

    print "\n--------\n";

}

my $twig = XML::Twig->new(
    twig_handlers => { delt => \&process_delt },
);
local $/;
$twig->parse(<DATA>);


__DATA__
<xml>
<service>
<title>split xml</title>
<main>
<doc id="001">
<title>doc1</title>
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
<doc id="002">
<title>doc2</title>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>

Edit: Take a look at @mirod's answer, because it's fully working. This one will just extract each 'delt' and then you'd probably have to mess around with figuring out parent stuff.

Upvotes: 0

mirod
mirod

Reputation: 16161

It's fairly simple to do this with XML::Twig (and I am happy I got the "delete the current element during parsing" to work a while back):

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $delt= 'delt.xml';

XML::Twig->new( twig_handlers => { delt => \&delt },
                pretty_print => 'indented',
              )
          ->parsefile( $delt);

exit;

sub delt
  { my( $t, $delt)= @_;

    my $delt_file= sprintf( 'delt_%s.xml', $delt->id);

    # the only tricky part: remove previous doc if needed
    if( my $prev_doc= $delt->parent( 'doc')->prev_sibling( 'doc')) 
      { $prev_doc->delete; }

    $t->print_to_file( $delt_file);

    $delt->delete;
  }

Upvotes: 1

Related Questions