Jakob Nielsen
Jakob Nielsen

Reputation: 73

Replacing characters by chaining str.replace methods produces wrong result

I want to be able to replace certain characters. The desired replacement order should be
A -> U, T -> A, G -> C, C -> G.

But for some reason, C does not get replaced with G. I've linked the code that I have so far.

v = "ATGC"
DNA = [v]
MRNA = []
for s in DNA:
    MRNA.append(s.replace('A', 'U').replace('T', 'A').replace('C', 'G').replace('G', 'C'))
print(MRNA)

Upvotes: 4

Views: 502

Answers (3)

Olivier Melançon
Olivier Melançon

Reputation: 22324

Using MRNA.replace('C', 'G').replace('G', 'C') will replace any 'C' by a 'G' which is immediately replaced back with a 'C'.

Instead of multiple str.replace you should use a translation table with str.maketrans and str.translate. Since this works in a single pass, it both avoids undoing a replacement and gets more efficient as the number of call to str.replace increases.

def dna_to_rna(s):
    trans_table = str.maketrans('ATCG', 'UAGC')
    return s.translate(trans_table)

print(dna_to_rna('ACGTAC')) # 'UGCAUG'

Upvotes: 13

Kelly Bundy
Kelly Bundy

Reputation: 27629

For the swap of 'G' and 'C', you could use 'T' as a buffer after you replaced all original 'T' (so you know that at that point there aren't any 'T' in the string and thus that's safe):

>>> 'ATGC'.replace('A', 'U').replace('T', 'A').replace('C', 'T').replace('G', 'C').replace('T', 'G')
'UACG'

Similar to the non-Python swap of two variables c and g:

t = c
c = g
g = t

instead of

c, g = g, c

Upvotes: -1

Ayush Garg
Ayush Garg

Reputation: 2517

The problem with this is, each replace is changing the last replace's output - meaning after you run .replace('C', 'G'), the string becomes "UACC", and the next replace will replace all C's into G's, meaning you get UAGG instead of UACG. To fix this, you can use a for loop to loop though each character and use a dictionary:

def DNA_to_RNA(s):
    mask_table = {"A": "U", "T": "A", "C": "G", "G": "C"}
    result = []
    for char in s:
        result.append(mask_table[char])
    return ''.join(result)

Or, using list comprehension:

def DNA_to_RNA(s):
    mask_table = {"A": "U", "T": "A", "C": "G", "G": "C"}
    return ''.join([mask_table[char] for char in s])

Upvotes: 1

Related Questions