Reputation: 177
I would like to match two files based on two column values per file. If both of the values of "BP" and "P" match in the same line, I want to print those lines on a third file, which is like file 2.
File 1:
CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
10 110408937 4.409e+00 1.623e+00 6.602e-03 2 1 Cardiovascular rs113627704
10 110408937 2.382e+00 1.124e+00 3.414e-02 3 1 Medication rs113627704
File 2:
CHR F SNP BP P TOTAL
10 1 rs113627704 110408937 1.112e-02 456
4 1 rs43567 2345677 0.045457 567
3 1 rs567899 479899 0.3456 223
Desired output:
CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
I have tried the following two:
awk 'FNR==NR{a[$4,$5]=$0;next}{if(b=a[$2,$5]){print b}}' file1 file2 > file3
Here I get the error "bash: awk: command not found." I use awk all the time and it always works.
awk 'FNR==NR {a[$4,$5]=$0; next} ($4,$5) in a {print a[$2,$5], $0}' file1 file2 > file3
Here I get an empty file.
Upvotes: 0
Views: 694
Reputation: 203209
There's some invisible character(s) in the word awk
in your command:
awk 'FNR==NR{a[$4,$5]=$0;next}{if(b=a[$2,$5]){print b}}' file1 file2 > file3
Using the string from your command:
$ type awk
-bash: type: awk: not found
Manually typing awk
:
$ type awk
awk is hashed (/usr/bin/awk)
Upvotes: 2
Reputation: 37394
This should work:
$ awk 'NR==FNR{a[$4,$5]=$0;next}(($2,$5) in a)' file2 file1
Output:
CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
Explained:
$ awk '
NR==FNR { # process file2 as output we want are from file1
a[$4,$5]=$0 # desired fields are 4th and 5th, use them as hash key
next # move to next record
} # process file1 below this point
(($2,$5) in a) # test if 2nd and 5th in hash and output
' file2 file1 # mind the file order
Upvotes: 5