joining 2 files on matching column values using awk

Question

I know there have been similar questions posted but I'm still having a bit of trouble getting the output I want using awk FNR==NR... I have 2 files as such

File 1: 
123|this|is|good
456|this|is|better
...

File 2:
aaa|123
bbb|456
...

So I want to join on values from file 2/column2 to file 1/column1 and output file 1 (col 2,3,4) and file 2 (col 1).

Thanks in advance.

Wintermute · Accepted Answer

With awk you could do something like

awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } $1 in val { $(NF + 1) = val[$1]; print }' file2 file1

NF is the number of fields in a record (line by default), so $NF is the last field, and $(NF + 1) is the field after that. By assigning the saved value from the pass over file2 to it, a new field is appended to the record before it is printed.

One thing to note: This behaves like an inner join, i.e., only records are printed whose key appears in both files. To make this a right join, you can use

awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } { $(NF + 1) = val[$1]; print }' file2 file1

That is, you can drop the $1 in val condition on the append-and-print action. If $1 is not in val, val[$1] is empty, and an empty field will be appended to the record before printing.

But it's probably better to use join:

join -1 1 -2 2 -t \| file1 file2

If you don't want the key field to be part of the output, pipe the output of either of those commands through cut -d \| -f 2- to get rid of it, i.e.

join -1 1 -2 2 -t \| file1 file2 | cut -d \| -f 2-

joining 2 files on matching column values using awk

Answers (2)

Related Questions