Reputation: 83
I am trying to use fields from one file to find all matches from another file and then print all fo the values from both. I would like to match the first field of file one with the first field of file two. and for all possible combinations of matches print fields from both files. something like:
file_1.txt:
1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67369.1 188011 bb_T1PKS
and
file_2.txt
1051mF-Z1 PHH67296.1 2611 PF00083.25
1051mF-Z1 PHH67296.1 2611 PF06609.14
1051mF-Z1 PHH67296.1 2611 PF07690.17
1051mF-Z1 PHH67305.1 29486 PF07883.12
1051mF-Z2 PHH58939.2 33999 PF11110.39
1051mF-Z3 PHH58936.2 55599 PF41114.2
Desired output:
1051mF-Z1 PHH67296.1 2611 PF00083.25 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF06609.14 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF07690.17 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67305.1 29486 PF07883.12 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF00083.25 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
1051mF-Z1 PHH67296.1 2611 PF06609.14 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
1051mF-Z1 PHH67296.1 2611 PF07690.17 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
1051mF-Z1 PHH67305.1 29486 PF07883.12 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
However the code I have been using:
awk 'NR==FNR{a[$1]=$0; next} {print $0,a[$1]}' file_1.txt file_2.txt
Only gives me the match from the first line of the first file and ignores the second?
Current output:
1051mF-Z1 PHH67296.1 2611 PF00083.25 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF06609.14 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF07690.17 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67305.1 29486 PF07883.12 1051mF-Z1 PHH67392.1 241796 bb_indole
Any suggestions would be appreciated! I have been able to do this with a loop -- but it takes forever and I was hoping to do it with an awk one liner if possible.
Upvotes: 1
Views: 88
Reputation: 26495
Using GNU join?
E.g.
join -j1 -o 2.1,2.2,2.3,2.4,1.1,1.2,1.3,1.4 <(sort -k1,1 file1.txt) <(sort -k1,1 file2.txt)
Explanation:
-j1
: join both files on the first column
-o 2.1,2.2,2.3,2.4,1.1,1.2,1.3,1.4
: specify the order of the output (2.1
means second file, first column, 2,2
means second file second column etc - you can alter this to suit)
<(sort -k1,1 file1.txt) / <(sort -k1,1 file2.txt)
join requires both files be sorted, so sort them on their first column (sort -k1,1
)
Upvotes: 2