user11381544
user11381544

Reputation:

I tried to replace all characters in a string but it's still not working as intended?

As a smaller part of a function, this code is intended to replace all 'G's with 'C's, all 'T's with 'A's, and vice versa for both but it's still not working as intended, it only replaces a few instances instead of replacing all of them.

dna1.txt = "GGTACGGATG"

file = open('dna1.txt')
            contents = file.read()
            replaced_contents = (contents.replace('G', 'C').replace('T', 'A',)
            .replace('A', 'T').replace('C', 'G'))
            print("Complement: {0}" .format(replaced_contents))

Upvotes: 1

Views: 60

Answers (4)

Mark Tolonen
Mark Tolonen

Reputation: 177901

Others have pointed out the issue with chained .replace, but there is a built-in translate function for strings that does the job quickly and easily:

dna = "GGTACGGATG"
xlat = str.maketrans('GTAC','CATG') # build translation table.
result = dna.translate(xlat)        # translate using table.
print(result)

Output:

CCATGCCTAC

References:

Upvotes: 1

Hugo G
Hugo G

Reputation: 16516

Let's run your code step by step:

input = GGTACGGATG

Replace all Gs with Cs

input = input.replace('G', 'C')

Now input looks like this:

CCTACCCATC

Wait! we also want to replace all Cs with Gs. How do we know which ones were initially G or C?

This approach doesn't work.


How about we use a token instead of C for the first replacement and then replace the tokens again with G in the end? The token should be something that doesn't occur in the text, and it can be one or more characters. Let's use # here. It can be anything, really.

input = input.replace('G', '#')

Now input looks like this:

##TAC##AT#

Okay, let's change all C to G now.

input = input.replace('C', 'G')

and we get

##TAG##AT#

Yay! Now we don't have any Cs left and we know what the Gs were. So let's replace the # now.

input = input.replace('#', 'C')

And we get

CCTAGCCATC

Done! All Gs and Cs have just swapped places!

But how do we find a token that definitely doesn't come up in the input? ...we can't. So, if the input is coming from users and could be anything, DO NOT USE TOKENS for replacement.


What the replace method does is it goes through the string letter by letter and replaces each occurrence with the new letter. For the next replacement we run this loop again.

The most universal and safest method, however, would be running in one loop and do all replacements at once, like so:

result = ""
letters_array = input.split('')
for (letter of letters_array) {
  if (letter == 'C') {
    result += 'G'
  } else if (letter == 'G') {
    result += 'C'
  } else {
    result += letter
  }
}

I'll leave the interpretation of the sample as an exercise to you :)

Upvotes: 0

Blaise
Blaise

Reputation: 111

Your replaces are fighting themselves.

Your first replace comes through and replaces all 'G' with 'C':

CCTACCCATC

Your second replace then comes through and replaces all 'T' with 'A':

CCAACCCAAC

Third replace then comes through and swaps all the 'A' (Including the 'T' you previously swapped to 'A') with 'T':

CCTTCCCTTC

Final replace sweeps through and swaps all the 'C' (Including the 'G' you previously swapped to 'C') with 'G':

GGTTGGGTTG

This would work instead, but as a disclaimer this is my first time ever looking at python, this may not be a good bit of code!

contents = "GGTACGGATG"
replaced_contents = ""
for c in contents:
  if c == 'G':
    replaced_contents += 'C'
  elif c == 'C':
    replaced_contents += 'G'
  elif c == 'T':
    replaced_contents += 'A'
  elif c == 'A':
    replaced_contents += 'T'
  else:
    replaced_contents += c

print("Complement: {0}" .format(replaced_contents))

Outputs: Complement: CCATGCCTAC

Also could do this with a dictionary replacement:

contents = "GGTACGGATG"
replacement = {
  'G':'C',
  'C':'G',
  'T':'A',
  'A':'T'
}
replaced_contents = ""
for c in contents:
    replaced_contents += replacement.get(c) or c

print("Complement: {0}" .format(replaced_contents))

Upvotes: 2

Simas Joneliunas
Simas Joneliunas

Reputation: 3128

As Felix has said, you have to treat is as an extended case of two variable switch problem where you use a temporary value to hold the information, such as

   a = temp
   a = b
   b = temp

Now if we apply it to a string, you would instead need to create a specific pattern to replace the first letter, for example print('GGTACGGATG'.replace('C','{PH}').replace('G', 'C').replace('{PH}', 'G')) would switch C's and G's around.

note that you also have to be careful to properly validate the placeholder to not include any of the replaced letters.

Upvotes: 0

Related Questions