trivial
trivial

Reputation: 141

How can I delete elements using Perl's XML::Twig?

I have some XML files like the following:

<machines>
<server>
    127.0.0.1
</server>
<proxy>
    <ip>127.0.0.2</ip>
    <etc>abc</etc>
</proxy>
</machines>

and I want to keep the server and delete others, the output should be:

<machines>
<server>
127.0.0.1
</server>
</machines>

I wrote script as follows:

use warnings;
use strict;
use feature ':5.10';
use XML::Twig;

my $path='C:\strawberry\perl\site\lib\file.xml';
my $filehandle;
my $tweak_server =sub{
    my ($twig, $root) =@_;
    my $elt=$root;
    while( $elt=$elt->next_elt($root)){
        my $tag=$elt->tag;
        say $tag;
        if ($tag!~/server/){
            $elt->delete($tag);         
        }       
    }
    $twig->flush;
};




open( $filehandle, "+<$path") or die "cannot open out file out_file:$!";
my $roots = { machines => 1 };
my $handlers = { 'machines' => $tweak_server,
            };
my $twig = new XML::Twig(TwigRoots => $roots,
                 TwigHandlers => $handlers,
                 pretty_print  => 'indented'#,
                # twig_print_outside_roots => \*$filehandle
                 );
$twig->parsefile($path);
close $filehandle;

and got the output:

server
#PCDATA
<machines>
<server></server>
<proxy>
<ip>127.0.0.2</ip>
<etc>abc</etc>
</proxy>
</machines>

I really don't understand why there is "#PCDATA" and why it doesn't work as I expect?

@mirod I tried as follows:

use warnings;
use strict;
use feature ':5.10';
use XML::Twig;

my $tweak_server =sub{
my ($twig, $root) =@_;
my $elt=$root;
my $text=$elt->first_child_text('id');
if ($text=~m/12/){
    while( $elt=$elt->next_elt('#ELT')){
        my $tag=$elt->tag;
        say $tag;
        if ($tag!~/id/){
            $elt->delete;           
        }       
    }
}
};

my $roots = { machines => 1 };
my $handlers = { 'machines/aaa' => $tweak_server,
            };
my $twig =XML::Twig->new(TwigRoots => $roots,
                 TwigHandlers => $handlers,
                 pretty_print  => 'indented'#,
                # twig_print_outside_roots => \*$filehandle
                 )
    ->parse( \*DATA) 
    ->print; 
__DATA__

<machines> 
<server> 127.0.0.1 </server> 
<aaa>
<id>12</id> 
<ip>127.0.0.2</ip>   
<option>127.0.0.6</option>
<etc>abc</etc>
</aaa> 
<aaa>
<id>14</id> 
<ip>127.0.0.2</ip>   
<etc>abc</etc>
</aaa> 
<aaa>
<id>15</id> 
<ip>127.0.0.2</ip>
<etc>abc</etc>
</aaa>
</machines>

and the output is :

<machines>
<server> 127.0.0.1 </server>
<aaa>
<id>12</id>
<option>127.0.0.6</option>
<etc>abc</etc>
</aaa>
<aaa>
<id>14</id>
<ip>127.0.0.2</ip>
<etc>abc</etc>
</aaa>
<aaa>
<id>15</id>
<ip>127.0.0.2</ip>
<etc>abc</etc>
</aaa>
</machines>

and what I want is to delete the three elements, not just one:

<ip>127.0.0.2</ip>   
<option>127.0.0.6</option>
<etc>abc</etc>

under the element

 <id>12</id>

any suggestion?

Upvotes: 2

Views: 2540

Answers (2)

mirod
mirod

Reputation: 16161

If your requirement is to keep only the server elements, then you can tell the module by having them as twig_roots. this will have the effect of keeping the root of the XML and the server elements (and their content), while discarding all the rest:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

XML::Twig->new( twig_roots => { server => 1 },
                pretty_print => 'indented',
              )
         ->parse( \*DATA)
         ->print;

__DATA__
<machines>
<server>
    127.0.0.1
</server>
<proxy>
    <ip>127.0.0.2</ip>
    <etc>abc</etc>
</proxy>
</machines>

Upvotes: 2

toolic
toolic

Reputation: 62037

The following will delete the proxy elements:

use warnings;
use strict;
use XML::Twig;

my $str = '
<machines>
<server>
    127.0.0.1
</server>
<proxy>
    <ip>127.0.0.2</ip>
    <etc>abc</etc>
</proxy>
</machines>
';

my $t = XML::Twig->new(
        twig_handlers => {
            proxy => sub { $_->delete() },
        },
        pretty_print  => 'indented',
);
$t->parse($str);
$t->print($str);
print "\n";

__END__

<machines>
  <server>
    127.0.0.1
</server>
</machines>

If you don't want to print out server and #PCDATA, then get rid of say $tag;.

Upvotes: 2

Related Questions