Chiara
Chiara

Reputation: 490

Retrieve all rows from 2 columns matching from 2 different files

I need to retrieve all rows from a file starting from some column matching from another file.

My first file is:

col1,col2,col3
1TF4,WP_110462952.1,AEV67733.1
1TF4,EGD45884.1,AEV67733.1
2BTO,NP_006073.2,XP_037953971.1
2BTO,XP_037953971.1,XP_037953971.1

The second one is:

col1,col2,col3,col4,col5
BAA13425.1,SDD02770.1,38.176,296,175
BAA13425.1,WP_002465021.1,32.056,287,185
BBE42932.1,AEG17356.1,40.909,110,64
BBE42932.1,WP_048124638.1,40.367,109,64

I want to retrieve all rows from the second file, where its file2_col1=file1_col3 and file2_col2=file1_col1

I tried like this but it doesn't print everything

awk -F"," 'FILENAME=="file1"{A[$3$2]=$3$2}
FILENAME=="file2"{if(A[$1$2]){print $0}}' file1 file2  > test

Upvotes: 2

Views: 39

Answers (1)

anubhava
anubhava

Reputation: 786001

I want to retrieve all rows from the second file, where its file2_col1=file1_col3 and file2_col2=file1_col1

You may use this 2 pass awk solution:

awk -F, 'FNR == NR {seen[$3,$1]; next} FNR == 1 || ($1,$2) in seen' file1 file2

col1,col2,col3,col4,col5
BAA13425.1,2BTO,32.056,287,185
BAA13425.1,2BTO,12.410,641,123

Where input files are:

cat file1

col1,col2,col3
1TF4,WP_110462952.1,AEV67733.1
1TF4,EGD45884.BAA13425.1
2BTO,NP_006073.2,BAA13425.1
2BTO,XP_037953971.1,BAA13425.1

cat file2

col1,col2,col3,col4,col5
BAA13425.1,SDD02770.1,38.176,296,175
BAA13425.1,2BTO,32.056,287,185
BBE42932.1,AEG17356.1,40.909,110,64
BBE42932.1,WP_048124638.1,40.367,109,64
BAA13425.1,2BTO,12.410,641,123

Upvotes: 2

Related Questions