Match a single column entry in one file to a column entry in a second file that consists of a list

Question

I need to match a single column entry in one file to a column entry in a second file that consists of a list (in shell). The awk command I've used only matches to the first word of the list, and doesn't scan through the entire list in the column field.

File 1 looks like this:

chr1:725751 LOC100288069        
rs3131980   LOC100288069        
rs28830877  LINC01128       
rs28873693  LINC01128       
rs34221207  ATP4A

File 2 looks like this:

Annotation Total Genes With Ann Your Genes  With Ann)   Your Genes  No Ann) Genome  With Ann)   Genome  No Ann) ln
1   path    hsa00190     Oxidative phosphorylation  55  55  1861    75  1139    5.9 9.64    0   0   ATP12A ATP4A ATP5A1 ATP5E ATP5F1 ATP5G1 ATP5G2 ATP5G3 ATP5J ATP5O ATP6V0A1 ATP6V0A4 ATP6V0D2 ATP6V1A ATP6V1C1 ATP6V1C2 ATP6V1D ATP6V1E1 ATP6V1E2 ATP6V1G3 ATP6V1H COX10 COX17 COX4I1 COX4I2 COX5A COX6B1 COX6C COX7A1 COX7A2 COX7A2L COX7C COX8A NDUFA5 NDUFA9 NDUFB3 NDUFB4 NDUFB5 NDUFB6 NDUFS1 NDUFS3 NDUFS4 NDUFS5 NDUFS6 NDUFS8 NDUFV1 NDUFV3 PP PPA2 SDHA SDHD TCIRG1 UQCRC2 UQCRFS1 UQCRH

Expected output:

rs34221207  ATP4A hsa00190

(please excuse the formatting - all the columns are tab-delimited until the column of gene names, $14, called Genome...)

My command is this:

awk 'NR==FNR{a[$14]=$3; next}a[$2]{print $0 "	" a[$2]}' file2 file 1

All help will be much appreciated!

Match a single column entry in one file to a column entry in a second file that consists of a list

Answers (1)

Related Questions