henryl
henryl

Reputation: 11

Read, delete some records and write the same XML file using Perl XML::Simple

I have been struggling with this at work today. Trying to read in an XML file like the one below (that I have quickly just typed in). I have a CSV file of show_id codes. So I read them in and put them in a hash. Then I read in the XML file using XML::Simple.

I then compare the show_id code in the element (done a loop with an array as in the online examples and then $a = $data->{Element1}->{Element2}->{show_id} and that found it) below and see if I have a match on the hash table. Bingo. I got that to work with no problem.

So let's say I match the middle two Element2 elements with show_id values of ABC11 and ABC12. Now I need to write a new file of the ones that do match. So I tried doing XMLout and I seem to lose the whole tag structure that I read in.

Is there any way to read in the data below and get rid of the records ABC10 and ABC14 for instance, and wring out the file in the same format? Let me know if that makes sense.

Also I only have XML::Simple and XML::Parser installed at work. Please HELP!!!

<?xml version="1.0" encoding="ISO-8859-1"?>
<main>
  <Element1>
    <Element2>
        <show/>
        <show_id>ABC10</show_id>
        <staring>
            <show_header>This is a test</show_header>
        </staring>
    </Element2>
        <Element2>
            <show/>
            <show_id>ABC11</show_id>
            <staring>
                <show_header>This is a test</show_header>
            </staring>
    </Element2>
        <Element2>
            <show/>
            <show_id>ABC12</show_id>
            <staring>
                <show_header>This is a test</show_header>
            </staring>
    </Element2>
        <Element2>
            <show/>
            <show_id>ABC14</show_id>
            <staring>
                <show_header>This is a test</show_header>
            </staring>
    </Element2>
  </Element1>
</main>

Upvotes: 1

Views: 1158

Answers (3)

Borodin
Borodin

Reputation: 126722

If you able to get XML::Twig installed, this is a solution you may prefer.

use strict;
use warnings;

use XML::Twig;

my %keep = (
  ABC11 => 1,
  ABC12 => 1,
);

my $twig = XML::Twig->new(
  keep_spaces => 1,
  twig_handlers => { Element2 => \&Element2 }
);  

$twig->parsefile('data.xml');
$twig->print;

sub Element2 {
  my ($twig, $elem) = @_;
  my $show_id = $elem->first_child_text('show_id');
  $elem->delete unless $keep{$show_id};
}

or if you prefer XML::LibXML then this will work

use strict;
use warnings;

use XML::LibXML;

my %keep = (
  ABC11 => 1,
  ABC12 => 1,
);

my $xml = XML::LibXML->load_xml(location => 'data.xml');

for my $elem2 ($xml->findnodes('//Element2')) {
  my $show_id = $elem2->find('show_id');
  $elem2->parentNode->removeChild($elem2) unless $keep{$show_id};
}

print $xml->toString;

The output of these programs is identical.

output

<?xml version="1.0" encoding="ISO-8859-1"?>
<main>
  <Element1>

        <Element2>
            <show/>
            <show_id>ABC11</show_id>
            <staring>
                <show_header>This is a test</show_header>
            </staring>
    </Element2>
        <Element2>
            <show/>
            <show_id>ABC12</show_id>
            <staring>
                <show_header>This is a test</show_header>
            </staring>
    </Element2>

  </Element1>
</main>

Upvotes: 2

runrig
runrig

Reputation: 6524

If you want the same thing going out as coming in, don't use XML::Simple. Here's a solution using XML::Rules:

use strict;
use warnings;

use XML::Rules;

my @keep_these = qw(
  ABC11
  ABC12
);
my %keep; $keep{$_}++ for @keep_these;

my @rules = (
  Element2 => sub {
    my $id = $_[1]->{show_id}{_content};
    return unless $keep{$id};
    return $_[0] => $_[1];
  },
);
my $p = XML::Rules->new(
  style => 'filter',
  rules => \@rules,
  stripspaces => 3,
);

$p->filter(\*DATA, \*STDOUT);

__END__
<?xml version="1.0" encoding="ISO-8859-1"?>
<main>
  <Element1>
    <Element2>
etc.

Upvotes: 1

yasu
yasu

Reputation: 1364

First, get rid of disused elements:

$data->{Element1}{Element2} = [
  grep { $_->{show_id} =~ /^ABC1[12]$/ } @{$data->{Element1}{Element2}}
];

And then, writing out in XML format. (With NoAttr => 1, hashes are represented as nested elements instead of attributes.)

print XMLout($data, NoAttr => 1, RootName => "main");

You can pass KeepRoot => 1 to XMLin and XMLout to handle root element ("main") instead of RootName => 1. If you do so, use $data->{main}{Element1}{Element2}.

Upvotes: 1

Related Questions