Reputation: 159
I want to find lines where fields 1 and 2 from file1 match fields 2 and 3 from file2, and then print all fields from file2. There are more lines in file2 than in file1
File1
rs116801199 720381
rs138295790 16057310
rs131531 16870251
rs131546 16872281
rs140375 16873251
rs131552 16873461
File2
--- rs116801199 720381 0.026 0.939 0.996 0 -1 -1 -1
1 rs12565286 721290 0.028 1.000 1.000 2 0.370 0.934 0.000
1 rs3094315 752566 0.432 1.000 1.000 2 0.678 0.671 0.435
--- rs3131972 752721 0.353 0.906 0.938 0 -1 -1 -1
--- rs61770173 753405 0.481 0.921 0.950 0 -1 -1 -1
I tried something like:
awk -F 'FNR==NR{a[$1];b[$2];next} FNR==1 || ($2 in a && $3 in b)' file1 file2 > test
But got a syntax error
Upvotes: 1
Views: 171
Reputation: 113964
Consider:
awk -F 'FNR==NR{a[$1];b[$2];next} FNR==1 || ($2 in a && $3 in b)' file1 file2
The option -F
expects an argument but no argument is provided intentionally. The result is that awk
interprets the entirety of the code as the field separator. That is why that code does not run as expected.
From the problem statement, I didn't see why FNR==1
should be in the code. So, I removed it. Once that is done, the parens are unnecessary. If that is the case, then, the code further simplifies to:
$ awk 'FNR==NR{a[$1];b[$2];next} $2 in a && $3 in b' file1 file2
--- rs116801199 720381 0.026 0.939 0.996 0 -1 -1 -1
Upvotes: 1