user2418702
user2418702

Reputation: 1

Compare file lines for match anywhere in second file

This is frustrating. I have 2 text file that are just a phone number per line. I need to read the first line from file1, and search file2 for a match. If there is a no match, write the line value to an output file. I've been trying this but I know its wrong.

$file1 = 'pokus1.txt';
$file2 = 'pokus2.txt';

open (F1, $file1) || die ("Could not open $file1!");
open (F2, $file2) || die ("Could not open $file2!");
open (OUTFILE, '>>output\output_x1.txt');
@f1data = <F1>;
@f2data = <F2>;

while (@f1data){
    @grp = grep {/$f1data/} @f2data;

    print OUTFILE "$grp";
}
close (F1);
close (F2);
close (OUTFILE);

I hope someone can help? Thanks Brent

Upvotes: 0

Views: 2513

Answers (3)

David W.
David W.

Reputation: 107090

Whenever you get a is one piece of data in one group in another group type question (and they come up quite a bit, you should think in terms of hashes.

A hash is a keyed lookup. Let's say you create a hash keyed on say... I don't know... phone numbers taken from file #1. If you read a line in file #2, you can easily see if it's in file #1 by simply looking at the hash. Fast, efficient.

use strict;   #ALWAYS ALWAYS ALWAYS
use warnings; #ALWAYS ALWAYS ALWAYS

use autodie;  #Will end the program if files you try to open don't exist

# Constants are a great way of storing data that is ...uh... constant
use constant {
    FILE_1    =>  "a1.txt",
    FILE_2    =>  "a2.txt",
};

my %phone_hash;

open my $phone_num1_fh, "<", FILE_1;

#Let's build our phone number hash
while ( my $phone_num = <$phone_num1_fh> ) {
    chomp $phone_num;
    $phone_hash{ $phone_num } = 1;   #Doesn't really matter, but best not a zero value
}
close $phone_num1_fh;

#Now that we have our phone hash, let's see if it's in file #2
open my $phone_num2_fh, "<", FILE_2;
while ( my $phone_num = <$phone_num2_fh> ) {
    chomp $phone_num;
    if ( exists $phone_hash { $phone_num } ) {
        print "$phone_num is in file #1 and file #2";
    }
    else {
        print "$phone_num is only in file #2";
    }
}

See how nicely that works. The only issue is that there may be phone numbers in file #1 that aren't in file #2. You could solve this by simply creating a second hash for all the phone numbers in file #2.

Let's do this one more time with two hashes:

my %phone_hash1;
my %phone_hash2;

open my $phone_num1_fh, "<", FILE_1;

while ( my $phone_num = <$phone_num1_fh> ) {
    chomp $phone_num;
    $phone_hash1{ $phone_num } = 1;
}
close $phone_num1_fh;

open my $phone_num2_fh, "<", FILE_2;

while ( my $phone_num = <$phone_num2_fh> ) {
    chomp $phone_num;
    $phone_hash2{ $phone_num } = 1;
}
close $phone_num1_fh;

Now, we'll use keys to list the keys and go through them. I'm going to create an %in_common hash when the phone is in both hashes

my %in_common;

for my $phone ( keys %phone_hash1 ) {
    if ( $phone_hash2{$phone} ) { 
       $in_common{$phone} = 1;    #Phone numbers in common between the two lists
    }
}

Now, I have three hashes %phone_hash1, %phone_hash2, and %in_common.

for my $phone ( sort keys %phone_hash1 ) {
    if ( not $in_common{$phone} ) {
         print "Phone number $phone is only in the first file\n";
    }
}

for my $phone ( sort keys %phone_hash2 ) {
    if ( not $in_common{$phone} ) {
        print "Phone number $phone is only in " . FILE_2 . "\n";
    }
}

for my $phone ( sort keys %in_common ) {
    print "Phone number $phone is in both files\n";
}

Note in this example, I didn't use the exists to see if the key exists in the hash. That is, I simply put if ( $phone_hash2{$phone} ) instead of if ( exists $phone_hash2{$phone} ). The first form checks to see if the key is defined -- even if the value is a null string or numerically zero.

The second form will be true as long as the value is not zero, a null string, or undefined. Since I purposefully set the value of the hash to 1, I can use this form. It's a good habit to use exists because there will be a situation where a valid value could be a null string or zero. However, some people like the way the code reads without using the exists when possible.

Upvotes: 1

michael501
michael501

Reputation: 1482

bash :

not exists

grep -vf file1 file2 > file3

shared

grep -f file1 file2 > file4

Upvotes: 2

Birei
Birei

Reputation: 36282

A customary solution where you process one file saving its data as keys of a hash and later process the other looking if that key exists:

#!/usr/bin/env perl

use warnings;
use strict;

my (%phone);

open my $fh1, '<', shift or die;
open my $fh2, '<', shift or die;
##open my $ofh, '>>', shift or die;

while ( <$fh2> ) { 
    chomp;
    $phone{ $_ } = 1;
}

while ( <$fh1> ) { 
    chomp;
    next if exists $phone{ $_ };
    ##printf $ofh qq|%s\n|, $_;
    printf qq|%s\n|, $_;
}

exit 0;

Run it like:

perl script.pl file1 file2 > outfile

Upvotes: 1

Related Questions