dellair
dellair

Reputation: 437

Perl to find unmatched fields in two files and in one file

I have two txt files:

fileA.txt (tab as delimiter)

field1  field2
A       value1
A       value2
A       value3
B       value112
C       value33
D       value11
E       value3
B       value23
E       value5

fileB.txt (tab as delimiter)

field1  field2
A       value1
A       value3
M       value9
B       value5
C       value33

I want the script to report:

  1. In fileB.txt, if two field1 have different field2
    report: A value1 and A value3
  2. In fileB.txt, if the field2 has different value than field2 in fileA.txt corresponding to the same field1
    report: B value5

So the output of the script is supposed to be:

A       value1
A       value3
B       value5

My script to accomplish #2:

#!/usr/bin/perl -w

my $fileA="/tmp/fileA.txt";
my $fileB="/tmp/fileB.txt";
my @orifields;
my @tmp;
my @unmatched;

open(FHD, "$fileB") || die "Unable to open $fileB: $!\n";
@orifields = <FHD>;
close(FHD);
chomp(@orifields);

open(FHD, "$fileA") || die "Unable to open $fileA: $!\n";
@tmp = <FHD>;
close(FHD);
chomp(@tmp);
foreach my $line (@tmp){
   print("Each line in fileA: $line\n");
}

foreach my $line (@orifields) {
   my ($field1, $field2) = split(/\t/, $line);
   print("Field1 is: $field1, Field2 is: $field2\n");

   if (! grep(/$line/, @tmp)) {
      if (grep(/$field1/,@tmp)) {
         push(@unmatched,"$line");
      }
   }
}

print("Unmatched: @unmatched\n");

Is there any nice approach to achieve both in the script without duplication of variables? Thanks in advance,

Upvotes: 0

Views: 81

Answers (1)

choroba
choroba

Reputation: 241838

Use hashes to remember the contents of the files:

#! /usr/bin/perl
use warnings;
use strict;

my %hash_a;
open my $FA, '<', 'fileA.txt' or die $!;
while (<$FA>) {
    chomp;
    my ($f1, $f2) = split /\t/;
    undef $hash_a{$f1}{$f2};
}


my %hash_b;
open my $FB, '<', 'fileB.txt' or die $!;
while (<$FB>) {
    chomp;
    my ($f1, $f2) = split /\t/;
    push @{ $hash_b{$f1} }, $f2;

    if (exists $hash_a{$f1} && ! exists $hash_a{$f1}{$f2}) {
        print "#2: $f1 $f2\n";
    }
}

for my $key (grep @{ $hash_b{$_} } > 1, keys %hash_b) {
    print join(' ', "#1: $key", @{ $hash_b{$key} }), "\n";
}

Upvotes: 2

Related Questions