Ascaris
Ascaris

Reputation: 83

use fields in one file to find fields in another and print all matches with fields from both

I am trying to use fields from one file to find all matches from another file and then print all fo the values from both. I would like to match the first field of file one with the first field of file two. and for all possible combinations of matches print fields from both files. something like:

file_1.txt:
1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67369.1 188011 bb_T1PKS

and

file_2.txt
1051mF-Z1 PHH67296.1 2611 PF00083.25
1051mF-Z1 PHH67296.1 2611 PF06609.14
1051mF-Z1 PHH67296.1 2611 PF07690.17
1051mF-Z1 PHH67305.1 29486 PF07883.12
1051mF-Z2 PHH58939.2 33999 PF11110.39
1051mF-Z3 PHH58936.2 55599 PF41114.2

Desired output:

1051mF-Z1 PHH67296.1 2611 PF00083.25 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF06609.14 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF07690.17 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67305.1 29486 PF07883.12 1051mF-Z1 PHH67392.1 241796 bb_indole    
1051mF-Z1 PHH67296.1 2611 PF00083.25 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
1051mF-Z1 PHH67296.1 2611 PF06609.14 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
1051mF-Z1 PHH67296.1 2611 PF07690.17 1051mF-Z1 PHH67369.1 188011 bb_T1PKS
1051mF-Z1 PHH67305.1 29486 PF07883.12 1051mF-Z1 PHH67369.1 188011 bb_T1PKS

However the code I have been using:

awk 'NR==FNR{a[$1]=$0; next} {print $0,a[$1]}' file_1.txt file_2.txt

Only gives me the match from the first line of the first file and ignores the second?

Current output:

1051mF-Z1 PHH67296.1 2611 PF00083.25 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF06609.14 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67296.1 2611 PF07690.17 1051mF-Z1 PHH67392.1 241796 bb_indole
1051mF-Z1 PHH67305.1 29486 PF07883.12 1051mF-Z1 PHH67392.1 241796 bb_indole

Any suggestions would be appreciated! I have been able to do this with a loop -- but it takes forever and I was hoping to do it with an awk one liner if possible.

Upvotes: 1

Views: 88

Answers (1)

jared_mamrot
jared_mamrot

Reputation: 26495

Using GNU join?

E.g.

join -j1 -o 2.1,2.2,2.3,2.4,1.1,1.2,1.3,1.4 <(sort -k1,1 file1.txt) <(sort -k1,1 file2.txt)

Explanation:

-j1: join both files on the first column

-o 2.1,2.2,2.3,2.4,1.1,1.2,1.3,1.4: specify the order of the output (2.1 means second file, first column, 2,2 means second file second column etc - you can alter this to suit)

<(sort -k1,1 file1.txt) / <(sort -k1,1 file2.txt) join requires both files be sorted, so sort them on their first column (sort -k1,1)

Upvotes: 2

Related Questions