Reputation: 43
I've got an xml file with a number of level3 elements. I want to remove all but one such elements. My xml file:
<?xml version="1.0" encoding="UTF-8"?>
<level1 id="level1_id">
<level2 id="level2_id">
<level3 id="level3_id1">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
<level3 id="level3_id2">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
<level3 id="level3_id3">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
</level2>
</level1>
My perl script:
my $filename = 'test3.xml';
my $outfile = $filename."_after";
open my $output, '>', $outfile or die "Couldn't open output file\n";
my $twig = new XML::Twig (twig_handlers => { 'level2' => \&edit });
$twig->parsefile($filename);
#$twig->flush;
$twig->print($output);
sub edit {
my ($twig, $element) = @_;
my @elements= $element->children('level3');
print $#elements."\n";
my @elements= @elements[1..$#elements];
print $#elements."\n";
my $count = 0;
foreach (@elements){
$count++;
$_->delete;
}
print $count;
$twig->purge;
}
This however just leaves the level1 element:
<?xml version="1.0" encoding="UTF-8"?>
<level1 id="level1_id"></level1>
On the other hand, my script works just fine when the top level is level2. Example xml file and the result after processing:
<?xml version="1.0" encoding="UTF-8"?>
<level2 id="level2_id">
<level3 id="level3_id1">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
<level3 id="level3_id2">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
<level3 id="level3_id3">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
</level2>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<level2 id="level2_id">
<level3 id="level3_id1">
<attributes>
<attribute>1</attribute>
<attribute>2</attribute>
</attributes>
</level3>
</level2>
This is exactly what I want, i.e. just one level3 element left. What am I doing wrong? Is it to do with how I define twig handlers? I don't want to hard code the xml structure, e.g. my $twig = new XML::Twig (twig_handlers => { 'level1/level2' => \&edit }); I don't know how deep level2 will be in an actual xml file and the actual files might not be identical in structure, so this part should be dynamic
Upvotes: 0
Views: 233
Reputation: 53478
I would suggest that unless you're specifically wanting to do incremental parsing on a large file, twig_handers
are needlessly complicated. It's a powerful tool if you want to treat XML as a stream and modify/discard parts of that, but actually usually just loading the whole XML, and working with it is simpler and clearer.
What you want to do appears to be to delete all 'level3' elements after the first.
So:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new->parsefile('your_xml_file.xml');
my $count;
foreach my $level3 ( $twig->get_xpath('.//level3') ) {
#delete after the first one.
$level3->delete if $count++;
}
#set formatting
$twig -> set_pretty_print('indented_a');
#print to stdout
$twig->print;
Upvotes: 0
Reputation: 126722
There is no need for the line $twig->purge
or anything like it and I don't understand why you have written it
It will discard anything that has been parsed but not printed to the output, which is the whole of the level2
element that you have just edited
I also recommend that you write
my $twig = XML::Twig->new(
twig_handlers => { level2 => \&edit },
pretty_print => 'indented',
);
as the indirect object syntax that you have used is ambiguous and prone to errors. The pretty_print
option will also make the output XML more readable.
Upvotes: 1