Reputation: 11
I have been struggling with this at work today. Trying to read in an XML file like the one below (that I have quickly just typed in). I have a CSV file of show_id
codes. So I read them in and put them in a hash. Then I read in the XML file using XML::Simple
.
I then compare the show_id
code in the element (done a loop with an array as in the online examples and then $a = $data->{Element1}->{Element2}->{show_id}
and that found it) below and see if I have a match on the hash table. Bingo. I got that to work with no problem.
So let's say I match the middle two Element2
elements with show_id
values of ABC11
and ABC12
. Now I need to write a new file of the ones that do match. So I tried doing XMLout
and I seem to lose the whole tag structure that I read in.
Is there any way to read in the data below and get rid of the records ABC10
and ABC14
for instance, and wring out the file in the same format? Let me know if that makes sense.
Also I only have XML::Simple
and XML::Parser
installed at work. Please HELP!!!
<?xml version="1.0" encoding="ISO-8859-1"?>
<main>
<Element1>
<Element2>
<show/>
<show_id>ABC10</show_id>
<staring>
<show_header>This is a test</show_header>
</staring>
</Element2>
<Element2>
<show/>
<show_id>ABC11</show_id>
<staring>
<show_header>This is a test</show_header>
</staring>
</Element2>
<Element2>
<show/>
<show_id>ABC12</show_id>
<staring>
<show_header>This is a test</show_header>
</staring>
</Element2>
<Element2>
<show/>
<show_id>ABC14</show_id>
<staring>
<show_header>This is a test</show_header>
</staring>
</Element2>
</Element1>
</main>
Upvotes: 1
Views: 1158
Reputation: 126722
If you able to get
XML::Twig
installed, this is a solution you may prefer.
use strict;
use warnings;
use XML::Twig;
my %keep = (
ABC11 => 1,
ABC12 => 1,
);
my $twig = XML::Twig->new(
keep_spaces => 1,
twig_handlers => { Element2 => \&Element2 }
);
$twig->parsefile('data.xml');
$twig->print;
sub Element2 {
my ($twig, $elem) = @_;
my $show_id = $elem->first_child_text('show_id');
$elem->delete unless $keep{$show_id};
}
or if you prefer
XML::LibXML
then this will work
use strict;
use warnings;
use XML::LibXML;
my %keep = (
ABC11 => 1,
ABC12 => 1,
);
my $xml = XML::LibXML->load_xml(location => 'data.xml');
for my $elem2 ($xml->findnodes('//Element2')) {
my $show_id = $elem2->find('show_id');
$elem2->parentNode->removeChild($elem2) unless $keep{$show_id};
}
print $xml->toString;
The output of these programs is identical.
output
<?xml version="1.0" encoding="ISO-8859-1"?>
<main>
<Element1>
<Element2>
<show/>
<show_id>ABC11</show_id>
<staring>
<show_header>This is a test</show_header>
</staring>
</Element2>
<Element2>
<show/>
<show_id>ABC12</show_id>
<staring>
<show_header>This is a test</show_header>
</staring>
</Element2>
</Element1>
</main>
Upvotes: 2
Reputation: 6524
If you want the same thing going out as coming in, don't use XML::Simple. Here's a solution using XML::Rules:
use strict;
use warnings;
use XML::Rules;
my @keep_these = qw(
ABC11
ABC12
);
my %keep; $keep{$_}++ for @keep_these;
my @rules = (
Element2 => sub {
my $id = $_[1]->{show_id}{_content};
return unless $keep{$id};
return $_[0] => $_[1];
},
);
my $p = XML::Rules->new(
style => 'filter',
rules => \@rules,
stripspaces => 3,
);
$p->filter(\*DATA, \*STDOUT);
__END__
<?xml version="1.0" encoding="ISO-8859-1"?>
<main>
<Element1>
<Element2>
etc.
Upvotes: 1
Reputation: 1364
First, get rid of disused elements:
$data->{Element1}{Element2} = [
grep { $_->{show_id} =~ /^ABC1[12]$/ } @{$data->{Element1}{Element2}}
];
And then, writing out in XML format. (With NoAttr => 1
, hashes are represented as nested elements instead of attributes.)
print XMLout($data, NoAttr => 1, RootName => "main");
You can pass KeepRoot => 1
to XMLin and XMLout to handle root element ("main") instead of RootName => 1
. If you do so, use $data->{main}{Element1}{Element2}
.
Upvotes: 1