Reputation: 839
I want to compare two big files with different column and row numbers and print those line which have a common word (like KJ
):
file1:
XT1 123 aa NR
XT2 444 bb GF
XT3 666 aa KJ
file2
fc KK pcn
wd CC KJ
output
XT3 666 aa wd CC KJ
I tried but I didn't get anything:
awk 'FNR==NR{a[$4]=$3;next}{if (a[$3])print a[$3],$0}' file1 file2
Thank you in advance for your help
Upvotes: 1
Views: 1149
Reputation: 385655
Based on the limited info you provided,
my $file1 = '...';
my $file2 = '...';
my %file2;
{
open(my $fh2, '<', $file2)
or die("Can't open \"$file2\": $!\n');
while (<$fh2>) {
my @F = split;
$file2{$F[2]} = join(' ', @F[0,1]);
}
}
{
open(my $fh1, '<', $file1)
or die("Can't open \"$file1\": $!\n');
while (<$fh1>) {
my @F = split;
print(join(' ', @F[0..2], $file2{$F[3]}, $F[3]), "\n")
if $file2{$F[3]};
}
}
I assumed the following:
file2
fits in memory as a hash of lines.file2
.file1
with the 3rd column of file2
.It maintains the order of the lines as they appear in file1
.
Upvotes: 2
Reputation: 54392
You were close, try this:
awk 'FNR==NR { a[$4]=$1 FS $2 FS $3; next } $3 in a { print a[$3], $0 }' file1 file2
Results:
XT3 666 aa wd CC KJ
Quick explanation:
In 'file1', add column 4 to an array with columns 1, 2 and 3 as it's values.
In 'file2', check if column 3 is in the array and if it is, print out it's value and the current line.
Upvotes: 2
Reputation: 3484
I'd create a separate hash of words to lines numbers for each file, and an array storing each line, for each file.
then I'd iterate of the list of words in file 1, and look for a match in file 2. If I find a match, then I'd look up the line numbers for the word in both files. Using the line number, I'd retrieve the "lines" form the arrays, and return the list of unique words.
Upvotes: 0