Reputation: 3
So I have a DNA sequence file and my goal is to randomly replace 5 of the nucleotides in the sequence with the letter M.
ie. dna1.txt has the sequence ACTGGCTACATTG.
I want to make ACTGGCTACATTG look something like ACMMGCMMCATMG or something of the sort.
I know how to replace one letter at a time, but not several.
dna1 = open ("dna1.txt","r")
data1 = dna1.read()
from random import randint, choice
def Mutated_DNA(data1):
dna_list = list(data1)
mutation_site = randint(0, len(dna_list)-1)
dna_list[mutation_site] = choice(list('M'))
return ''.join(dna_list)
print (Mutated_DNA(data1))
What should I do?
Upvotes: 0
Views: 1895
Reputation: 353149
If you want to replace exactly 5 characters with something new, then I think the simplest way is to sample from the possible positions and then change exactly those. For example:
from random import sample
def mutate(s, num, target):
change_locs = set(sample(range(len(s)), num))
changed = (target if i in change_locs else c for i,c in enumerate(s))
return ''.join(changed)
e.g.
>>> mutate('ABC', 2, 'M')
'MMC'
>>> mutate('ABC', 2, 'M')
'AMM'
>>> mutate('ABC', 2, 'M')
'MMC'
>>> mutate('ABC', 2, 'M')
'MBM'
or
def mutate(s, num, target):
change_locs = sample(range(len(s)), num)
new_s = list(s)
for change_loc in change_locs:
new_s[change_loc] = target
return ''.join(new_s)
etc.
Upvotes: 1