user3653270
user3653270

Reputation: 166

joining 2 files on matching column values using awk

I know there have been similar questions posted but I'm still having a bit of trouble getting the output I want using awk FNR==NR... I have 2 files as such

File 1: 
123|this|is|good
456|this|is|better
...

File 2:
aaa|123
bbb|456
...

So I want to join on values from file 2/column2 to file 1/column1 and output file 1 (col 2,3,4) and file 2 (col 1).

Thanks in advance.

Upvotes: 1

Views: 1252

Answers (2)

Wintermute
Wintermute

Reputation: 44073

With awk you could do something like

awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } $1 in val { $(NF + 1) = val[$1]; print }' file2 file1

NF is the number of fields in a record (line by default), so $NF is the last field, and $(NF + 1) is the field after that. By assigning the saved value from the pass over file2 to it, a new field is appended to the record before it is printed.

One thing to note: This behaves like an inner join, i.e., only records are printed whose key appears in both files. To make this a right join, you can use

awk -F \| 'BEGIN { OFS = FS } NR == FNR { val[$2] = $1; next } { $(NF + 1) = val[$1]; print }' file2 file1

That is, you can drop the $1 in val condition on the append-and-print action. If $1 is not in val, val[$1] is empty, and an empty field will be appended to the record before printing.

But it's probably better to use join:

join -1 1 -2 2 -t \| file1 file2

If you don't want the key field to be part of the output, pipe the output of either of those commands through cut -d \| -f 2- to get rid of it, i.e.

join -1 1 -2 2 -t \| file1 file2 | cut -d \| -f 2-

Upvotes: 3

glenn jackman
glenn jackman

Reputation: 247250

If the files have the same number of lines in the same order, then

paste -d '|' file1 file2 | cut -d '|' -f 2-5
this|is|good|aaa
this|is|better|bbb

I see in a comment to Wintermute's answer that the files aren't sorted. With bash, process substitutions are handy to sort on the fly:

paste -d '|' <(sort -t '|' -k 1,1 file1) <(sort -t '|' -k 2,2 file2) |
cut -d '|' -f 2-5

To reiterate: this solution requires a one-to-one correspondence between the files

Upvotes: 1

Related Questions