user2742080
user2742080

Reputation: 43

Python3 replace using dictionary

Could anyone please explain what is wrong here:

def get_complementary_sequence(string):
    dic = {'A':'T', 'C':'G', 'T':'A', 'G':'C'}
    for a, b in dic.items():
        string = string.replace(a, b)
    return string

I get proper results for 'T' and 'C', but 'A' and 'C' won't replace. Got really stuck.

String looks like 'ACGTACG'.

Upvotes: 3

Views: 1215

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1122182

You are first replacing all As with Ts before then replacing all Ts with As again (including those you just replaced As with!):

>>> string = 'ACGTACG'
>>> string.replace('A', 'T')
'TCGTTCG'
>>> string.replace('A', 'T').replace('T', 'A')
'ACGAACG'

Use a translation map instead, fed to str.translate():

transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
return string.translate(transmap)

The str.translate() method requires a dictionary mapping codepoints (integers) to replacement characters (either a single character or a codepoint), or None (to delete the codepoint from the input string). The ord() function gives us those codepoints for the given 'from' letters.

This looks up characters in string, one by one in C code, in the translation map, instead of replacing all As followed by all Ts.

str.translate() has the added advantage of being much faster than a series of str.replace() calls.

Demo:

>>> string = 'ACGTACG'
>>> transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
>>> string.translate(transmap)
'TGCATGC'

Upvotes: 6

9000
9000

Reputation: 40894

Mutable data is your enemy :)

See, you first replace all As with Ts, then, in another iteration, replace all Ts with As again.

What works:

# for Creek and Watson's sake, name your variables sensibly
complements = {ord('A'):'T', ord('C'):'G', ord('T'):'A', ord('G'):'C'}
sequence = "AGCTTCAG"
print(sequence.translate(complements))

It prints TCGAAGTC.

Upvotes: 2

Related Questions