Reputation: 43
Could anyone please explain what is wrong here:
def get_complementary_sequence(string):
dic = {'A':'T', 'C':'G', 'T':'A', 'G':'C'}
for a, b in dic.items():
string = string.replace(a, b)
return string
I get proper results for 'T' and 'C', but 'A' and 'C' won't replace. Got really stuck.
String looks like 'ACGTACG'.
Upvotes: 3
Views: 1215
Reputation: 1122182
You are first replacing all A
s with T
s before then replacing all T
s with A
s again (including those you just replaced A
s with!):
>>> string = 'ACGTACG'
>>> string.replace('A', 'T')
'TCGTTCG'
>>> string.replace('A', 'T').replace('T', 'A')
'ACGAACG'
Use a translation map instead, fed to str.translate()
:
transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
return string.translate(transmap)
The str.translate()
method requires a dictionary mapping codepoints (integers) to replacement characters (either a single character or a codepoint), or None
(to delete the codepoint from the input string). The ord()
function gives us those codepoints for the given 'from' letters.
This looks up characters in string
, one by one in C code, in the translation map, instead of replacing all A
s followed by all T
s.
str.translate()
has the added advantage of being much faster than a series of str.replace()
calls.
Demo:
>>> string = 'ACGTACG'
>>> transmap = {ord('A'): 'T', ord('C'): 'G', ord('T'): 'A', ord('G'): 'C'}
>>> string.translate(transmap)
'TGCATGC'
Upvotes: 6
Reputation: 40894
Mutable data is your enemy :)
See, you first replace all A
s with T
s, then, in another iteration, replace all T
s with A
s again.
What works:
# for Creek and Watson's sake, name your variables sensibly
complements = {ord('A'):'T', ord('C'):'G', ord('T'):'A', ord('G'):'C'}
sequence = "AGCTTCAG"
print(sequence.translate(complements))
It prints TCGAAGTC
.
Upvotes: 2