Reputation:
As a smaller part of a function, this code is intended to replace all 'G's with 'C's, all 'T's with 'A's, and vice versa for both but it's still not working as intended, it only replaces a few instances instead of replacing all of them.
dna1.txt = "GGTACGGATG"
file = open('dna1.txt')
contents = file.read()
replaced_contents = (contents.replace('G', 'C').replace('T', 'A',)
.replace('A', 'T').replace('C', 'G'))
print("Complement: {0}" .format(replaced_contents))
Upvotes: 1
Views: 60
Reputation: 177901
Others have pointed out the issue with chained .replace
, but there is a built-in translate
function for strings that does the job quickly and easily:
dna = "GGTACGGATG"
xlat = str.maketrans('GTAC','CATG') # build translation table.
result = dna.translate(xlat) # translate using table.
print(result)
Output:
CCATGCCTAC
References:
Upvotes: 1
Reputation: 16516
Let's run your code step by step:
input = GGTACGGATG
Replace all Gs with Cs
input = input.replace('G', 'C')
Now input looks like this:
CCTACCCATC
Wait! we also want to replace all Cs with Gs. How do we know which ones were initially G or C?
This approach doesn't work.
How about we use a token instead of C
for the first replacement and then replace the tokens again with G
in the end? The token should be something that doesn't occur in the text, and it can be one or more characters. Let's use #
here. It can be anything, really.
input = input.replace('G', '#')
Now input looks like this:
##TAC##AT#
Okay, let's change all C
to G
now.
input = input.replace('C', 'G')
and we get
##TAG##AT#
Yay! Now we don't have any C
s left and we know what the G
s were. So let's replace the #
now.
input = input.replace('#', 'C')
And we get
CCTAGCCATC
Done! All G
s and C
s have just swapped places!
But how do we find a token that definitely doesn't come up in the input? ...we can't. So, if the input is coming from users and could be anything, DO NOT USE TOKENS for replacement.
What the replace
method does is it goes through the string letter by letter and replaces each occurrence with the new letter. For the next replacement we run this loop again.
The most universal and safest method, however, would be running in one loop and do all replacements at once, like so:
result = ""
letters_array = input.split('')
for (letter of letters_array) {
if (letter == 'C') {
result += 'G'
} else if (letter == 'G') {
result += 'C'
} else {
result += letter
}
}
I'll leave the interpretation of the sample as an exercise to you :)
Upvotes: 0
Reputation: 111
Your replaces are fighting themselves.
Your first replace comes through and replaces all 'G' with 'C':
CCTACCCATC
Your second replace then comes through and replaces all 'T' with 'A':
CCAACCCAAC
Third replace then comes through and swaps all the 'A' (Including the 'T' you previously swapped to 'A') with 'T':
CCTTCCCTTC
Final replace sweeps through and swaps all the 'C' (Including the 'G' you previously swapped to 'C') with 'G':
GGTTGGGTTG
This would work instead, but as a disclaimer this is my first time ever looking at python, this may not be a good bit of code!
contents = "GGTACGGATG"
replaced_contents = ""
for c in contents:
if c == 'G':
replaced_contents += 'C'
elif c == 'C':
replaced_contents += 'G'
elif c == 'T':
replaced_contents += 'A'
elif c == 'A':
replaced_contents += 'T'
else:
replaced_contents += c
print("Complement: {0}" .format(replaced_contents))
Outputs:
Complement: CCATGCCTAC
Also could do this with a dictionary replacement:
contents = "GGTACGGATG"
replacement = {
'G':'C',
'C':'G',
'T':'A',
'A':'T'
}
replaced_contents = ""
for c in contents:
replaced_contents += replacement.get(c) or c
print("Complement: {0}" .format(replaced_contents))
Upvotes: 2
Reputation: 3128
As Felix has said, you have to treat is as an extended case of two variable switch problem where you use a temporary value to hold the information, such as
a = temp
a = b
b = temp
Now if we apply it to a string, you would instead need to create a specific pattern to replace the first letter, for example print('GGTACGGATG'.replace('C','{PH}').replace('G', 'C').replace('{PH}', 'G'))
would switch C's and G's around.
note that you also have to be careful to properly validate the placeholder to not include any of the replaced letters.
Upvotes: 0