EA00
EA00

Reputation: 633

Awk - print matches and non matches in same file

I have two pretty large files each with 3 columns. Columns 1 and 2 are pairs and must both match to columns 1 and 2 in the second file (in the same order). If columns 1 and 2 from File 1 match in File 2 then print column 3 of File 2 next to column 3 of File 1.

File 1:
    THR190   GLU195   50.6
    VAL188   ASP197   99.2
    PHE199   LYS184    2.5
    .......

File 2: 
    THR190   GLU195   0.6 
    PHE199   LYS184  100.0 
    ARG196   VAL188  22.5 
    .......

 Output:
    THR190   GLU195   50.6  0.6
    VAL188   ASP197   99.2  
    PHE199   LYS184   2.5   100.0
    ARG196   VAL188         22.5
    .......

Let's say there is a line in File 2 that doesn't match a line in File 1, is there a way I can also print that line in the output as shown in the example?

I know how to make an array and match the contents of the second file with awk, but I'm not sure how to alter the print statement to get my desired output.

awk 'FNR==NR{lines[$1]=$2;next}{print $0,lines[$1]}'

Upvotes: 0

Views: 109

Answers (1)

James Brown
James Brown

Reputation: 37404

Something like this should work:

$ awk 'NR==FNR{                  # process file1
    a[$1 OFS $2]=$3              # hash to a 
    next
}
{                                # process file2
    print $1,$2,a[$1 OFS $2],$3  # output
    delete a[$1 OFS $2]          # delete processed entries of file1
}
END {                            # dump the rest of records from file1
    for(i in a)
        print i,a[i]
}' file1 file2

Output:

THR190 GLU195 50.6 0.6
PHE199 LYS184 2.5 100.0
ARG196 VAL188  22.5
VAL188 ASP197 99.2

Output is not as pretty as in your expected output, tho.

Upvotes: 2

Related Questions