fill in missing values from second or third file (bash)

Question

I have the following three files:

list1.txt

AB0001  COG0593
AB0002  COG0592
AB0003  COG1195
AB0005  COG1005
AB0006  COG5621
AB0007  COG4591
AB0008  COG1136
AB0009  COG0071
AB0010  COG3212

list2.txt

AB0001  COG0593
AB0002  COG0592
AB0003  COG1195
AB0004  
AB0005  
AB0006  COG5621
AB0007  COG3127
AB0008  COG1136
AB0009  COG0071
AB0010  COG3212

list3.txt

AB0001  COG0593
AB0002  COG0592
AB0003  COG1195
AB0004  COG5146
AB0005  NOG84439
AB0006  COG5621
AB0007  COG0577
AB0008  COG1136
AB0009  COG0071
AB0010  NOG218375

and I want to fill in the missing values (from the first column AB00[01-10]) with values from column2 of the other lists, with list1 having the most priority, list2 second most and list3 the least priority. So the desired output would be:

AB0001  COG0593
AB0002  COG0592
AB0003  COG1195
AB0004  COG5146
AB0005  COG1005
AB0006  COG5621
AB0007  COG4591
AB0008  COG1136
AB0009  COG0071
AB0010  COG3212

meaning that list1 should serve as the basis, if a value is missing, take it from list2, if the value is also missing in list2, take it from list3.

jas · Accepted Answer

Process the files in reverse order of their precedence and the higher precedence will win. Using NF>1 ensures that lines with missing values are ignored.

$ awk 'BEGIN {FS=OFS="	"} NF > 1 {a[$1] = $2} END {for (i in a) print i, a[i]}' list3.txt list2.txt list1.txt | sort
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212

fill in missing values from second or third file (bash)

Answers (2)

Related Questions