user610151
user610151

Reputation: 1

Perl - Compare two text files and then match only the difference found on the first file

I'm trying to make a script that would only print the difference in text found in the first file but not in the second file.

For example the first text file contains:

a
b
c
d

While the second file contains:

a
x
y
z

With the script that I'm trying, it prints the difference for both the files which is:

b
c
d
x
y
z

But the result I can't figure out to make is just:

b
c
d

Here is the code:

use strict;
use warnings;

my $f1 = 'C:\Strawberry\new.raw';
my $f2 = 'C:\Strawberry\orig.raw';
my $outfile = 'C:\Strawberry\mt_deleted.txt';
my %results = ();

open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){
 $results{$line}=1;
}
close(FILE1);

open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
 $results{$line}++;
}
close(FILE2);


open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
 print OUTFILE $line if $results{$line} == 1;
}
close OUTFILE;

Upvotes: 0

Views: 528

Answers (3)

ikegami
ikegami

Reputation: 385575

Let's start by forming a lookup table of what's in file 2.

my %seen;
while (<$fh2>) {
   chomp;
   ++$seen{$_};
}

To print each line of file 1 not found in file 2, simply process file 1 line by line and printing the line if it's not in the lookup table.

while (<$fh1>) {
   chomp;
   say if !$seen{$_};
}

You said the files could have duplicate lines, but you didn't say how you wanted to handle them. The above handles duplicates as follows:

File 1:

a
a
a
b
c

File 2:

c
a

Output:

b

Upvotes: 1

ikegami
ikegami

Reputation: 385575

Let's start by counting the number of occurrences of each line in file 2.

my %counts;
while (<$fh2>) {
   chomp;
   ++$counts{$_};
}

To print each line of file 1 not matched by a line in file 2, simply process file 1 line by line, decrementing the count, and printing the line if the count is negative.

while (<$fh1>) {
   chomp;
   say if --$counts{$_} < 0;
}

You said the files could have duplicate lines, but you didn't say how you wanted to handle them. The above handles duplicates as follows:

File 1:

a
a
a
b
c

File 2:

c
a

Output:

a
a
b

Upvotes: 1

Kamal Nayan
Kamal Nayan

Reputation: 1940

You need to add chomp, and assign different value for keys of file2

use strict;
use warnings;

my $f1      = 'C:\Strawberry\new.raw';
my $f2      = 'C:\Strawberry\orig.raw';
my $outfile = 'C:\Strawberry\mt_deleted.txt';
my %results = ();

open FILE1, "$f1" or die "Could not open file: $! \n";
while ( my $line = <FILE1> ) {
    chomp $line;
    $results{$line} = 1;
}
close(FILE1);

open FILE2, "$f2" or die "Could not open file: $! \n";
while ( my $line = <FILE2> ) {
    chomp $line;
    $results{$line} = 2;
}
close(FILE2);

open( OUTFILE, ">$outfile" ) or die "Cannot open $outfile for writing \n";
foreach my $line ( keys %results ) {
    print OUTFILE "$line\n" if $results{$line} == 1;
}
close OUTFILE;

Upvotes: 1

Related Questions