Improving some snipets

Question

I have this functons to simulate mutations on DNA sequences( sequence of letter -> 'ACGTGCTTAGG', for exemple).

The first one just change a random position of the input sequence

def mutate(sequence):
    seq_lst = list(sequence)
    i = random.randint(0, len(seq_lst) - 1)
    seq_lst[i] = random.choice(list('ATCG'))
    return ''.join(seq_lst)

The second one is to simulate a insertion of a base inside a random position in the sequence.

def insertion(sequence):
    seq_lst = list(sequence)
    i = random.randint(0, len(seq_lst) - 1)
    mutate = seq_lst[:i] + [random.choice(list('ATCG'))] + seq_lst[i:]
    return ''.join(mutate)]

The last one is to select all kinds of possible random mutations that can occur in a sequence.

def mutations(sequence):
    i = random.randint(0, 3)
    print(i)
    if i == 0:
        print('SNV')
        return mutate(sequence)
    elif i == 1:
        print('Del')
        return sequence.replace(random.choice('ATCG'), '-')
    elif i == 2:
        print('Ins')
        return insertion(sequence)
    elif i == 3:
        print('No mut')
        return sequence

The print statements are just to check if the code is working accordling.

Any suggestion for improvement? If possible suggestions how to insert mutations probabilities in the code to simulate a more real situation.

What I saw in the return of 10000 random process is that the sequence accumulates a lot of deletions, what is wrong once single point mutations are more frequente, followed by insertions and deletions with less frequency.

Thanks

Improving some snipets

Answers (1)

Related Questions