Reputation: 517
I'm using two text files sampleA.txt and sampleB.txt. I have two fields in each file and I need to compare first record(first row) of sampleA.txt with the first row of sampleB.txt and I want to show matching records as well as miss matching records in command prompt.I need to do that in Perl.
Using the below script I'm getting one output but it is wrong. I need to populate both matching as well as mismatching. How to do that?
sampleA.txt:
1|X
2|A
4|Z
5|A
sampleB.txt:
2|A
2|X
3|B
4|C
Output I'm getting:
2|A
2|X
4|C
Outputs I want:
Matching-Output:
2|A
Miss-matching-Output:
1|X
4|Z
5|A
3|B
4|C
Perl Script:
#!/usr/bin/perl
use strict;
use warnings;
open(FILE1,'C:\Users\sathiya.kumar\Desktop\sampleA.txt') || die $!;
open(FILE2,'C:\Users\sathiya.kumar\Desktop\sampleB.txt') || die $!;
my $interline;
while (my $line= <FILE1>) {
my @fields = split('\|',$line);
parser($fields[0]);
}
sub parser {
my $mergeid = shift;
while (defined $interline || ($interline= <FILE2>)) {
my @fields = split('\|',$interline);
my $key = $fields[0];
if ($key lt $mergeid) {
# Skip non-matching records
$interline = undef;
next;
} elsif ($key gt $mergeid) {
# wait for next key
last;
} else {
print $interline;
$interline = undef;
}
}
}
close(FILE1);
close(FILE2);
Let me know if you need more information.
Upvotes: 1
Views: 74
Reputation: 48629
You left out 2|X
:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
#Create a set from the entries in sampleA.txt:
my $fname = 'sampleA.txt';
open my $A_INFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %a;
while (my $line = <$A_INFILE>) {
chomp $line;
$a{$line} = undef;
}
close $A_INFILE;
say Dumper(\%a);
#Create a set from the entries in sampleB.txt:
$fname = 'sampleB.txt';
open my $B_INFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %b;
while (my $line = <$B_INFILE>) {
chomp $line;
$b{$line} = undef;
}
close $B_INFILE;
say Dumper(\%b);
#Divide the entries in both files into matches and mismatches:
my (@matches, @mismatches);
for my $a_val (keys %a) {
if (exists $b{$a_val}) {
push @matches, $a_val;
}
else {
push @mismatches, $a_val;
}
}
for my $b_val (keys %b) {
if (not exists $a{$b_val}) {
push @mismatches, $b_val;
}
}
say Dumper(\@matches);
say Dumper(\@mismatches);
--output:--
$VAR1 = {
'5|A' => undef,
'4|Z' => undef,
'1|X' => undef,
'2|A' => undef
};
$VAR1 = {
'2|X' => undef,
'3|B' => undef,
'4|C' => undef,
'2|A' => undef
};
$VAR1 = [
'2|A'
];
$VAR1 = [
'5|A',
'4|Z',
'1|X',
'2|X',
'3|B',
'4|C'
];
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to find out whether Perl's internal hashing algorithm is performing poorly on your data set. For example, you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16" , which means only one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn't supposed to happen. If a tied hash is evaluated in scalar context, the SCALAR method is called (with a fallback to FIRSTKEY ).
http://perldoc.perl.org/perldata.html
Upvotes: 1