Reputation: 59
sorry for asking again and again, because lack of knowledge in perl and I am new to programming language. my actual problem is I am extracting some nodes from several files and storing in a string and in that string I have some repeated strings so I need to delete repeated strings. so I tried after your suggestion like this.
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
my @ARGV ="c:/main/work";die "Need directories\n" unless @ARGV;
my $all="<DTCSpecification xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
find(
sub {
return unless ( /(_tr\.xml)$/ and -f );
extract_information();
return;
},
@ARGV
);
my $elem;
sub extract_information {
my $path = $_;
if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
while ( $reader->nextElement( 'university' )) {
$elem = $reader->readOuterXml();
$all=$all.$elem;
}
}
return;
}
my $so="</DTCSpecification>";
$all= $all.$so;
my $doc = XML::LibXML->load_xml( string=>$all);
my %seen;
foreach my $uni ( $doc->findnodes('//university') )
my $name = $uni->find('Code');
print "'$name' duplicated\n",
$uni->unbindNode() if $seen{$name}++; # Remove if seen before
}
$all = $all.$doc->toString;
print $all;
it printing two times first time with repeated information and second time with without repeated. I tried so much but I can't understanding why its printing two times.how to eliminate printing with repeated information. and also it is removing nodes according to element "code" so its removing node when the code element occurred second time, but some times I have useful information in second one not at the first appearance. How should I overcome this. could you help me. I am very very sorry for asking again and again and wasting your valuable time.its my humble request.
Upvotes: 0
Views: 508
Reputation: 37136
It's not clear why XML::LibXML::Reader
is needed here. Perhaps a little more information would help in this regard.
Here's how I would do it though:
use strict;
use warnings;
use XML::LibXML;
my $file = 'universities.xml';
my $doc = XML::LibXML->load_xml( location => $file );
my %seen;
foreach my $uni ( $doc->findnodes('//university') ) { # 'university' nodes only
my $name = $uni->find('name');
print "'$name' duplicated\n",
$uni->unbindNode() if $seen{$name}++; # Remove if seen before
}
$doc->toFile('universities.xml'); # Print to file
Upvotes: 6