AWK replace full string in TABLE2 according to TABLE1

Question

I have TABLE1 where first column is a string which should be replaced in the TABLE2 and second column in the TABLE1 is the value which should replace the string.

TABLE1 looks as this:

g63. MYL9
g5990. PTC7
g6018. POLYUBQ
g17850. NAA50

Table 2 looks for example like this:

PIZI01000001v1 AUGUSTUS gene 751753 768572 0.06 - . g63.
PIZI01000001v1  AUGUSTUS    intron  751969  752021  1   -   .   transcript_id "g63.t1"; gene_id "g63.
PIZI01000001v1 AUGUSTUS gene 16680331 16688019 0.25 + . g630.
PIZI01000001v1  AUGUSTUS    intron  16680415    16683083    0.35    +   .   transcript_id "g630.t1"; gene_id "g630.
PIZI01000001v1 AUGUSTUS gene 16695081 16703546 0.93 + . g631.
PIZI01000001v1 AUGUSTUS gene 16730752 16735366 0.65 + . g632.
PIZI01000008v1 AUGUSTUS gene 1943857 1944177 0.71 - . g6299.

So I assembled the awk command

awk 'FNR==NR { array[$1]==$2; next } { for (i in array) gsub(i, array[i]) }1' TABLE1 TABLE

which works up to the limit that for example with value MYL9 is not replaced only the string g63. but also the strings like g630, g631, g632 ... g6300 ..... and so on. So the Final table would look like this

PIZI01000001v1 AUGUSTUS gene 751753 768572 0.06 - . MYL9
PIZI01000001v1  AUGUSTUS    intron  751969  752021  1   -   .   transcript_id "MYL9"; gene_id "MYL9
PIZI01000001v1 AUGUSTUS gene 16680331 16688019 0.25 + . MYL9
PIZI01000001v1  AUGUSTUS    intron  16680415    16683083    0.35    +   .   transcript_id "MYL9t1"; gene_id "MYL9
PIZI01000001v1 AUGUSTUS gene 16695081 16703546 0.93 + . MYL9
PIZI01000001v1 AUGUSTUS gene 16730752 16735366 0.65 + . MYL9
PIZI01000008v1 AUGUSTUS gene 1943857 1944177 0.71 - . g6299.

And I need it to edit jus g63. and not other like g630. and so on.

I spend quite long time with this and now I have to take pause, so if anybody has an idea whats wrong there, I would appreciate. Thanks

AWK replace full string in TABLE2 according to TABLE1

Answers (1)

Related Questions