Compare two files of different columns and print different columns

Question

I want to compare 2nd column of file2 with 1st column of file1. If they are equal i want to add the 2nd column of file1 to file2 as shown in output.txt.

file2

chr5    ENST00000514151    utr5    0    +
chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +

file1

ENST00000512281    a
ENST00000504031    b
ENST00000776348    c

output.txt

chr5    a    ENST00000512281    utr5    0    +
chr5    a    ENST00000512281    utr5    0    +
chr5    a    ENST00000512281    utr5    0    +

I was able compare the files with

awk 'NR==FNR{a[$1];next}$2 in a{print}' file1 file2

This gives below output:

chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +

But I do not know how to add the 2nd colum of file1 into the output.

H&#229;kon H&#230;gland · Accepted Answer

You can store the value of $2 in file1 into the array using a[$1]=$2. So you could try:

awk '
   NR==FNR{ 
     a[$1]=$2 ; next }
   $2 in a {
     $1=$1 FS a[$2]
     print 
   }' file1 file2

Output:

chr5 b ENST00000504031 utr5 0 +
chr5 b ENST00000504031 utr5 0 +
chr5 a ENST00000512281 utr5 0 +
chr5 a ENST00000512281 utr5 0 +
chr5 a ENST00000512281 utr5 0 +

Explanation:

This modifies $1 in file2 using $1=$1 FS a[$2] where FS is the default field separator, which is a space.. and then rebuilds the record, such that it can be printed by print later..
The print can be simplified to a 1 if desired.. Like $2 in a { $1=$1 FS a[$2] }1
Note that this rebuilds the record in file2 and thus any sequences of spaces or tabs will be truncated to a single space in the output. To keep the original formatting in file2 one could use the split() function in Gnu Awk version 4..

Compare two files of different columns and print different columns

Answers (1)

Related Questions