theo4786
theo4786

Reputation: 159

awk to match two fields in two files

I want to find lines where fields 1 and 2 from file1 match fields 2 and 3 from file2, and then print all fields from file2. There are more lines in file2 than in file1

File1

rs116801199 720381 
rs138295790 16057310
rs131531 16870251
rs131546 16872281
rs140375 16873251
rs131552 16873461

File2

--- rs116801199 720381 0.026 0.939 0.996 0 -1 -1 -1                                                                            
1 rs12565286 721290 0.028 1.000 1.000 2 0.370 0.934 0.000                                                                      
1 rs3094315 752566 0.432 1.000 1.000 2 0.678 0.671 0.435                                                                       
--- rs3131972 752721 0.353 0.906 0.938 0 -1 -1 -1                                                                              
--- rs61770173 753405 0.481 0.921 0.950 0 -1 -1 -1   

I tried something like:

awk -F 'FNR==NR{a[$1];b[$2];next} FNR==1 || ($2 in a && $3 in b)' file1 file2 > test

But got a syntax error

Upvotes: 1

Views: 171

Answers (1)

John1024
John1024

Reputation: 113964

Consider:

awk -F 'FNR==NR{a[$1];b[$2];next} FNR==1 || ($2 in a && $3 in b)' file1 file2

The option -F expects an argument but no argument is provided intentionally. The result is that awk interprets the entirety of the code as the field separator. That is why that code does not run as expected.

From the problem statement, I didn't see why FNR==1 should be in the code. So, I removed it. Once that is done, the parens are unnecessary. If that is the case, then, the code further simplifies to:

$ awk 'FNR==NR{a[$1];b[$2];next} $2 in a && $3 in b' file1 file2 
--- rs116801199 720381 0.026 0.939 0.996 0 -1 -1 -1

Upvotes: 1

Related Questions