Reputation: 633
I have two pretty large files each with 3 columns. Columns 1 and 2 are pairs and must both match to columns 1 and 2 in the second file (in the same order). If columns 1 and 2 from File 1
match in File 2
then print column 3 of File 2
next to column 3 of File 1
.
File 1:
THR190 GLU195 50.6
VAL188 ASP197 99.2
PHE199 LYS184 2.5
.......
File 2:
THR190 GLU195 0.6
PHE199 LYS184 100.0
ARG196 VAL188 22.5
.......
Output:
THR190 GLU195 50.6 0.6
VAL188 ASP197 99.2
PHE199 LYS184 2.5 100.0
ARG196 VAL188 22.5
.......
Let's say there is a line in File 2
that doesn't match a line in File 1
, is there a way I can also print that line in the output as shown in the example?
I know how to make an array and match the contents of the second file with awk, but I'm not sure how to alter the print statement to get my desired output.
awk 'FNR==NR{lines[$1]=$2;next}{print $0,lines[$1]}'
Upvotes: 0
Views: 109
Reputation: 37404
Something like this should work:
$ awk 'NR==FNR{ # process file1
a[$1 OFS $2]=$3 # hash to a
next
}
{ # process file2
print $1,$2,a[$1 OFS $2],$3 # output
delete a[$1 OFS $2] # delete processed entries of file1
}
END { # dump the rest of records from file1
for(i in a)
print i,a[i]
}' file1 file2
Output:
THR190 GLU195 50.6 0.6
PHE199 LYS184 2.5 100.0
ARG196 VAL188 22.5
VAL188 ASP197 99.2
Output is not as pretty as in your expected output, tho.
Upvotes: 2