AvinashK
AvinashK

Reputation: 3423

Delete a particular node in XML using perl

I have a package.xml file which has the following structure:-

<package name="com/avinash/foo1">
    <sourcefile name="bar1.java">
        <line no="1" mi="3"/>
        <line no="3" mi="2"/>
    </sourcefile>
    <sourcefile name="bar2.java">
        <line no="1" mi="5"/>
        <line no="6" mi="8"/>
        <line no="7" mi="3"/>
    </sourcefile>
</package>
<package name="com/avinash/foo2">
.
.
.
.
</package>

Using Perl, I have to delete all the line nodes for which no="1". I have found that splice can be used to delete nodes in xml. I have written the following code to do that:-

my $xmlFilePath = 'package.xml';
use XML::Simple;
my $xs = XML::Simple->new (ForceArray => 1);
my $ref = $xs->XMLin($xmlFilePath);

foreach(@{$ref->{'package'}}) {
    my %packageTag = %{$_};        

    foreach(@{$packageTag{'sourcefile'}}){
        my %sourcefileTag = %{$_};

        my $lineCtr = 0;

        foreach(@{$sourcefileTag{'line'}}){
            my %lineTag = %{$_};

            if($lineTag{'no'}==1){
                #splice : something like "splice @{$ref{$packageTag{$sourcefileTag->{'line'}}}}, $lineCtr, 1;"
            }

            $lineCtr = $lineCtr + 1;

        }
    }
}

I am a newbie and very confused about @, %, $ conversion in Perl. I do not know how to write the array part (first argument) of the splice function. Can anyone please tell me what would be the splice function which would do the deletion of the line node?

Thanks in advance.

Upvotes: 0

Views: 338

Answers (3)

Sobrique
Sobrique

Reputation: 53478

Deleting nodes using XML::Twig:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig -> new ( 'pretty_print' => 'indented', 
                              'twig_handlers' => { 
                                   'line[@no="1"]' => sub { $_ -> delete } } );
   $twig -> parsefile ( 'your_file');
   $twig -> print;

You can use parsefile_inplace with XML::Twig to do this too:

my $twig = XML::Twig -> new ( 'pretty_print' => 'indented', 
                              'twig_handlers' => { 'line[@no="1"]' => sub { $_ -> delete } } );
   $twig -> parsefile_inplace ( 'your_file');

Or you can simply manipulate your parsed XML:

my $twig = XML::Twig->new( 'pretty_print' => 'indented' );
$twig->parsefile ('your_file'); 
foreach my $line ( $twig->get_xpath('//line') ) {
    if ( $line->att("no") eq "1" ) {
        $line->delete;
    }
}
$twig->print;

Upvotes: 0

hobbs
hobbs

Reputation: 239861

As an alternative to XML::Simple, here's a solution using XML::Twig which has the advantage of not loading the entire document into memory (useful if your input file is large) while remaining rather simple.

use XML::Twig;

my $twig = XML::Twig->new(
  twig_roots => {
    'package/sourcefile/line' => \&handle_line,
  },
  twig_print_outside_roots => 1,
);

sub handle_line {
  my ($twig, $line) = @_;
  $line->print unless $line->att('no') == 1;
} 

$twig->parsefile('package.xml');

Yep, it's that easy. twig_print_outside_roots says that anything that isn't a line element inside a sourcefile inside a package should be printed to the output without any processing, while those line elements should be passed to the handle_line sub for processing. handle_line simply checks if the element's no attribute is 1, and prints the element only if it isn't.

This reads from package.xml and prints to standard output, which you can redirect to a new file. Or you can modify it to print to a file directly by opening the file yourself, and passing the filehandle to both twig_print_outside_roots and the print method.

Upvotes: 1

Nick P
Nick P

Reputation: 769

I'll second the recommendation to not use XML::Simple, but if you're going ahead some advice is below, since I think there are other issues to discuss anyway.

You can't splice inside a for/foreach, you'd be modifying the array you are looping over which causes all kinds of problems.

To filter a list you should be using grep from outside of it.

Also, your example file does not work for me. I need to add more tags to the XML file (the XML declaration node and a containing root node) or XML::Simple complains.

And finally, the name attribute is special (yet another reason to not use XML::Simple). You need to supply the KeyAttr setting to stop it folding your data up.

Try the below.

use XML::Simple;
my $xs = XML::Simple->new (ForceArray => 1, KeyAttr => []);
my $packages = $xs->XMLin('package.xml');

for my $package (@{$packages->{'package'}}) {
    for my $sourcefile ( @{$package->{'sourcefile'}} ) {
        my $lines = $sourcefile->{'line'};

        my @filtered = grep { $_->{'no'} != 1 } @{$lines};
        $sourcefile->{'line'} = \@filtered;
    }   
}

Upvotes: 1

Related Questions