Reputation: 845
I have the following three files:
list1.txt
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
list2.txt
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004
AB0005
AB0006 COG5621
AB0007 COG3127
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
list3.txt
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 NOG84439
AB0006 COG5621
AB0007 COG0577
AB0008 COG1136
AB0009 COG0071
AB0010 NOG218375
and I want to fill in the missing values (from the first column AB00[01-10]
) with values from column2 of the other lists, with list1 having the most priority, list2 second most and list3 the least priority.
So the desired output would be:
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
meaning that list1 should serve as the basis, if a value is missing, take it from list2, if the value is also missing in list2, take it from list3.
Upvotes: 0
Views: 84
Reputation: 92884
Short join + awk combination:
join -a2 list1.txt list2.txt | join -a2 - list3.txt | awk '{print $1,$2}' OFS='\t'
The output:
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
Upvotes: 0
Reputation: 10865
Process the files in reverse order of their precedence and the higher precedence will win. Using NF>1
ensures that lines with missing values are ignored.
$ awk 'BEGIN {FS=OFS="\t"} NF > 1 {a[$1] = $2} END {for (i in a) print i, a[i]}' list3.txt list2.txt list1.txt | sort
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
Upvotes: 2