EpiMan
EpiMan

Reputation: 839

compare two files with different column and row numbers

I want to compare two big files with different column and row numbers and print those line which have a common word (like KJ):

file1:

XT1 123 aa NR
XT2 444 bb GF 
XT3 666 aa KJ

file2

fc KK pcn
wd CC KJ

output

XT3 666 aa wd CC KJ

I tried but I didn't get anything:

awk 'FNR==NR{a[$4]=$3;next}{if (a[$3])print a[$3],$0}' file1 file2

Thank you in advance for your help

Upvotes: 1

Views: 1149

Answers (3)

ikegami
ikegami

Reputation: 385655

Based on the limited info you provided,

my $file1 = '...';
my $file2 = '...';

my %file2;
{
   open(my $fh2, '<', $file2)
      or die("Can't open \"$file2\": $!\n');
   while (<$fh2>) {
      my @F = split;
      $file2{$F[2]} = join(' ', @F[0,1]);
   }
}

{
   open(my $fh1, '<', $file1)
      or die("Can't open \"$file1\": $!\n');
   while (<$fh1>) {
      my @F = split;
      print(join(' ', @F[0..2], $file2{$F[3]}, $F[3]), "\n")
         if $file2{$F[3]};
   }
}

I assumed the following:

  • file2 fits in memory as a hash of lines.
  • A keyword doesn't appear twice in file2.
  • You're only interested in matching the 4th column of file1 with the 3rd column of file2.

It maintains the order of the lines as they appear in file1.

Upvotes: 2

Steve
Steve

Reputation: 54392

You were close, try this:

awk 'FNR==NR { a[$4]=$1 FS $2 FS $3; next } $3 in a { print a[$3], $0 }' file1 file2

Results:

XT3 666 aa wd CC KJ

Quick explanation:

In 'file1', add column 4 to an array with columns 1, 2 and 3 as it's values.

In 'file2', check if column 3 is in the array and if it is, print out it's value and the current line.

Upvotes: 2

Len Jaffe
Len Jaffe

Reputation: 3484

I'd create a separate hash of words to lines numbers for each file, and an array storing each line, for each file.

then I'd iterate of the list of words in file 1, and look for a match in file 2. If I find a match, then I'd look up the line numbers for the word in both files. Using the line number, I'd retrieve the "lines" form the arrays, and return the list of unique words.

Upvotes: 0

Related Questions