Reputation: 1
I have TABLE1 where first column is a string which should be replaced in the TABLE2 and second column in the TABLE1 is the value which should replace the string.
TABLE1 looks as this:
g63. MYL9
g5990. PTC7
g6018. POLYUBQ
g17850. NAA50
Table 2 looks for example like this:
PIZI01000001v1 AUGUSTUS gene 751753 768572 0.06 - . g63.
PIZI01000001v1 AUGUSTUS intron 751969 752021 1 - . transcript_id "g63.t1"; gene_id "g63.
PIZI01000001v1 AUGUSTUS gene 16680331 16688019 0.25 + . g630.
PIZI01000001v1 AUGUSTUS intron 16680415 16683083 0.35 + . transcript_id "g630.t1"; gene_id "g630.
PIZI01000001v1 AUGUSTUS gene 16695081 16703546 0.93 + . g631.
PIZI01000001v1 AUGUSTUS gene 16730752 16735366 0.65 + . g632.
PIZI01000008v1 AUGUSTUS gene 1943857 1944177 0.71 - . g6299.
So I assembled the awk command
awk 'FNR==NR { array[$1]==$2; next } { for (i in array) gsub(i, array[i]) }1' TABLE1 TABLE
which works up to the limit that for example with value MYL9 is not replaced only the string g63. but also the strings like g630, g631, g632 ... g6300 ..... and so on. So the Final table would look like this
PIZI01000001v1 AUGUSTUS gene 751753 768572 0.06 - . MYL9
PIZI01000001v1 AUGUSTUS intron 751969 752021 1 - . transcript_id "MYL9"; gene_id "MYL9
PIZI01000001v1 AUGUSTUS gene 16680331 16688019 0.25 + . MYL9
PIZI01000001v1 AUGUSTUS intron 16680415 16683083 0.35 + . transcript_id "MYL9t1"; gene_id "MYL9
PIZI01000001v1 AUGUSTUS gene 16695081 16703546 0.93 + . MYL9
PIZI01000001v1 AUGUSTUS gene 16730752 16735366 0.65 + . MYL9
PIZI01000008v1 AUGUSTUS gene 1943857 1944177 0.71 - . g6299.
And I need it to edit jus g63. and not other like g630. and so on.
I spend quite long time with this and now I have to take pause, so if anybody has an idea whats wrong there, I would appreciate. Thanks
Upvotes: -2
Views: 75
Reputation: 1
So I solved the problem in non elegant way. I realized, that the dot on the end in the first line is handled as special character (any symbol) so I just replaced the dots with underscore.
Upvotes: 0