Reputation: 83
i have a input xml which i have to split based on doc and delt wise and save it in this format delt_0001.xml
This is my code
#!/usr/bin/perl
use XML::XPath;
my $file = 'file.xml';
my $xp = XML::XPath->new(filename=>$file);
foreach my $entry ( $xp->findnodes('/xml/service/main/doc') ) {
my $filename = $entry->findvalue('./delt/@id');
foreach my $entry1( $entry->findnodes('//delt')){
my $filename = $entry1->findvalue('/delt/@id');
my $content = $entry1->toString;
open(wr,">delt_$filename.xml");
print wr "$content\n";
close wr;
}
When i run the program all delt portion prints in one xml.
input xml delt.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id="001">
<title>doc1</title>
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
<doc id="002">
<title>doc2</title>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
output am geting
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
output needed
split no 1 delt_0001.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=001>
<title>doc1</title>
<delt id=0001>
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
split no 2 delt_0002-A.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=001>
<title>doc1</title>
<delt id=0002=A>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
split no 3 delt_0003.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=002>
<title>doc2</title>
<delt id=0003>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
split no 4 delt_0004.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=002>
<title>doc2</title>
<delt id=0004>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
<delt>
</doc>
</main>
</service>
</xml>
Thanks in advance
Upvotes: 2
Views: 188
Reputation: 53478
The reason you're having difficult is because what you're doing is extracting a subset from an XML doc, but then trying to also include some of the stuff from the 'parent'.
Pulling your 'delts' out would be fairly straightforward
I would be wanting to use XML::Twig
with this - this is a perfect place to use a twig handler.
I'd be thinking something along the lines of (and apologies, this doesn't quite work yet).
use strict;
use warnings;
use XML::Twig;
sub process_delt {
my ( $twig, $delt ) = @_;
my $delt_id = $delt->att('id');
print "\nID:\n$delt_id\n";
my $filename = "$delt_id.xml";
$delt->set_pretty_print('indented');
$delt->print;
print "\n--------\n";
}
my $twig = XML::Twig->new(
twig_handlers => { delt => \&process_delt },
);
local $/;
$twig->parse(<DATA>);
__DATA__
<xml>
<service>
<title>split xml</title>
<main>
<doc id="001">
<title>doc1</title>
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
<doc id="002">
<title>doc2</title>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
Edit: Take a look at @mirod's answer, because it's fully working. This one will just extract each 'delt' and then you'd probably have to mess around with figuring out parent stuff.
Upvotes: 0
Reputation: 16161
It's fairly simple to do this with XML::Twig (and I am happy I got the "delete the current element during parsing" to work a while back):
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $delt= 'delt.xml';
XML::Twig->new( twig_handlers => { delt => \&delt },
pretty_print => 'indented',
)
->parsefile( $delt);
exit;
sub delt
{ my( $t, $delt)= @_;
my $delt_file= sprintf( 'delt_%s.xml', $delt->id);
# the only tricky part: remove previous doc if needed
if( my $prev_doc= $delt->parent( 'doc')->prev_sibling( 'doc'))
{ $prev_doc->delete; }
$t->print_to_file( $delt_file);
$delt->delete;
}
Upvotes: 1