tstev
tstev

Reputation: 616

Join files based on two columns using awk

I have the following two files;

$ cat file1
1 4
2 5
3 6

$ cat file2
4 2 N1
4 1 Y1
6 2 N2
6 3 Y2
2 5 Y3

I am interested in the third column of file2. So I want to join the two files based on two first columns which are ID columns. After much searching (here, here or here for example) I tried a few things and it works in principle;

awk 'FNR==NR{a[$1,$2];next}; ($1, $2) in a || ($2, $1) in a{print $3}' file1 file2
Y1
Y2
Y3

Or,

awk 'FNR==NR{a[$1,$2]=$3;next}; ($1, $2) in a || ($2, $1) in a{print $0, a[$1,$2]}' file2 file1
1 4
2 5 Y3
3 6

However not my exact desired output;

1 4 Y1
2 5 Y3
3 6 Y2

The order of IDs in file1 is important as first column is for example males and second females. In file2 the columns maybe either males or females.

Upvotes: 1

Views: 747

Answers (2)

hek2mgl
hek2mgl

Reputation: 157947

Like this:

awk 'NR==FNR{s[$1 OFS $2]; next}
     ($2 OFS $1) in s {
         print $2, $1, $3
     }
     ($1 OFS $2) in s {
         print $1, $2, $3
     }' file file2

Upvotes: 1

George Vasiliou
George Vasiliou

Reputation: 6335

Not very short, but does the job:

$ awk 'FNR==NR{a[$1,$2]=$1 FS $2;a[$2,$1]=a[$1,$2];next}; ($1,$2) in a || ($2,$1) in a{print a[$1,$2],$3}' file1 file2

1 4 Y1
3 6 Y2
2 5 Y3

Alternative:

$ awk 'FNR==NR{a[$1,$2]=$1 FS $2;next}; {pr=0};($1,$2) in a {pr=a[$1,$2]};($2,$1) in a{pr=a[$2,$1]};pr{print pr,$3}' file1 file2

Upvotes: 1

Related Questions