awk to match two fields in two files

Question

I want to find lines where fields 1 and 2 from file1 match fields 2 and 3 from file2, and then print all fields from file2. There are more lines in file2 than in file1

File1

rs116801199 720381 
rs138295790 16057310
rs131531 16870251
rs131546 16872281
rs140375 16873251
rs131552 16873461

File2

--- rs116801199 720381 0.026 0.939 0.996 0 -1 -1 -1                                                                            
1 rs12565286 721290 0.028 1.000 1.000 2 0.370 0.934 0.000                                                                      
1 rs3094315 752566 0.432 1.000 1.000 2 0.678 0.671 0.435                                                                       
--- rs3131972 752721 0.353 0.906 0.938 0 -1 -1 -1                                                                              
--- rs61770173 753405 0.481 0.921 0.950 0 -1 -1 -1

I tried something like:

awk -F 'FNR==NR{a[$1];b[$2];next} FNR==1 || ($2 in a && $3 in b)' file1 file2 > test

But got a syntax error

John1024 · Accepted Answer

Consider:

awk -F 'FNR==NR{a[$1];b[$2];next} FNR==1 || ($2 in a && $3 in b)' file1 file2

The option -F expects an argument but no argument is provided intentionally. The result is that awk interprets the entirety of the code as the field separator. That is why that code does not run as expected.

From the problem statement, I didn't see why FNR==1 should be in the code. So, I removed it. Once that is done, the parens are unnecessary. If that is the case, then, the code further simplifies to:

$ awk 'FNR==NR{a[$1];b[$2];next} $2 in a && $3 in b' file1 file2 
--- rs116801199 720381 0.026 0.939 0.996 0 -1 -1 -1

awk to match two fields in two files

Answers (1)

Related Questions