Reputation: 31
I have xml data like this
<ce:affiliation id="aff1">
<ce:label>a</ce:label>
<ce:textfn>Department of Urology, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands</ce:textfn>
<sa:affiliation>
<sa:organization>Department of Urology</sa:organization>
<sa:organization>Radboud University Nijmegen Medical Center</sa:organization>
<sa:city>Nijmegen</sa:city>
</sa:affiliation>
and ect..
nw i want read the text inside the "sa:affiliation" while reading text, first read text in tag inside sa:affilliation and make text like "Department of Urology, Radboud University Nijmegen Medical Center, Nijmegen" in this "," separation format and compare this text with text which inside "ce:textn" .... "/ce:textn"
like is i need to compare each ce:affillition tag with sa:affilliation for multiple files and if any mismatch need tell to user.
Upvotes: 1
Views: 200
Reputation: 31
Finally i got required output.
#!/usr/bin/perl
@files= <*.xml>;
open my $out, '>', 'output.xml' or die $!;
foreach $file (@files){
open (FILE, "$file");
my $a =1;
while(my $line= <FILE> ){
do{
if($line =~ /<ce:affiliation id=\"aff$a\">(.+?)<\/ce:affiliation>/){
$count=$1;
if($count =~ /<ce:label>/){
$count=~ s/<ce:label>(.+?)<\/ce:label>//;}
if($count =~ /<sa:affiliation>/){
if($count =~ /<ce:textfn>(.+?)<\/ce:textfn><sa:affiliation>(.+?)<\/sa:affiliation>/){
$textfn=$1;
$sff=$2;
$sff =~ s/<\/sa:organization>/, /g;
$sff =~ s/<\/sa:city>/, /g;
$sff =~ s/<\/sa:country>/, /g;
$sff =~ s/<\/sa:state>/, /g;
$sff =~ s/<sa:organization>//g;
$sff =~ s/<sa:city>//g;
$sff =~ s/<sa:country>//g;
$sff =~ s/<sa:state>//g;
chop($sff);
chop($sff);}
if($textfn ne $sff){
print $out("$file ce:aff and sa:aff mismatch in aff$a\n");}
if($textfn =~ /<ce:sup>/){
print $out("$file check label aff$a\n");}}
else{
if($line =~ /\"art520.dtd\"/){
print $out("$file strct affilition missing for aff$a\n");
}}}
$a=$a+1;
}while($line =~ /aff$a/);}}
Upvotes: 0
Reputation: 31
finally i found this code but is there any method to pickup this ce:affillition and sa:affillition text without using if else condition because it failed some condition.
#!/usr/bin/perl
@files = <*.xml>;
open my $out, '>', 'output.xml' or die $!;
foreach $file (@files) {
open (FILE, "$file");
$a =1;
while(my $line= <FILE> ){
do{
if ($line =~ /<ce:affiliation id=\"aff$a\">(.+?)<ce:textfn>(.+?)<\/ce:textfn><sa:affiliation>(.+?)<\/sa:affiliation><\/ce:affiliation>/){
$count = $3;
$textfn = $2;
print ("$count\n");
print ("$textfn\n");
if ($count =~ /<\/sa:(.+?)>/){
$count =~ s/<\/sa:organization>/, /g;
$count =~ s/<\/sa:city>/, /g;
$count =~ s/<\/sa:country>/, /g;
$count =~ s/<\/sa:state>/, /g;
$count =~ s/<sa:organization>//g;
$count =~ s/<sa:city>//g;
$count =~ s/<sa:country>//g;
$count =~ s/<sa:state>//g;
chop($count);
chop($count);
if($count ne $textfn){
print $out("$file affilliation $a is mismatch\n");}}}
else{
if($line =~ /<ce:affiliation id=\"aff$a\">(.+?)<ce:textfn>(.+?)<\/ce:textfn><\/ce:affiliation>/){
print $out("$file sa:affilliation missing for $a\n");}}
$a=$a+1;}
while($line =~ /aff$a/);}}
For this condition xml i am getting wrong result
<ce:affiliation id="aff1"><ce:label>a</ce:label><ce:textfn>Department of Urology, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands</ce:textfn><sa:affiliation><sa:organization>Department of Urology</sa:organization><sa:organization>Radboud University Nijmegen Medical Center</sa:organization><sa:city>Nijmegen</sa:city><sa:country>The Netherlands</sa:country></sa:affiliation></ce:affiliation><ce:affiliation id="aff2"><ce:textfn>Norris Comprehensive Cancer Center, University of Southern California Institute of Urology, Los Angeles, California</ce:textfn></ce:affiliation><ce:affiliation id="aff3"><ce:label>c</ce:label><ce:textfn>Department of Urology, Stanford University, Stanford, California</ce:textfn><sa:affiliation><sa:organization>Department of Urology</sa:organization><sa:organization>Stanford University</sa:organization><sa:city>Stanford</sa:city><sa:state>California</sa:state></sa:affiliation></ce:affiliation><ce:correspondence id="cor1"></article>
Upvotes: 0
Reputation: 16171
Your question is a bit vague. It is not clear where each fragment of XML goes. One file? several files? One fragment per file? Several? If the data is in several files, how do you link a ce:affilliation
element with the corresponding sa:affilliation
, especially if what you are checking is whether the 2 texts match? Why is there no country in sa:affilliation
? Where are the namespaces declared?
Assuming the 2 pieces of data are in 2 files, and the namespace prefixes do not change:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
use Test::More;
my $DEFAULT_COUNTRY= "The Netherlands";
# usage is <tool> <ce file> <sa file>
my( $ce_file, $sa_file)= @ARGV;
my $ce= XML::Twig->new->parsefile( $ce_file)->root;
my $ce_text = $ce->field( 'ce:textfn');
my $sa= XML::Twig->new->parsefile( $sa_file)->root;
# add the country if not present
if( ! $sa->first_child( 'sa:country'))
{ $sa->insert_new_elt( last_child => 'sa:country' => $DEFAULT_COUNTRY); }
my $sa_text= join( ', ', $sa->children_text);
is( $ce_text, $sa_text, "checking " . $ce->id);
done_testing();
Upvotes: 2
Reputation: 160
You can use XML::XPath to find the nodes you want. Then just check whether the two nodes' string_value
are neq
.
Upvotes: 1