user1647556
user1647556

Reputation:

Using python to count nucleotide mutations in an alignment

I have a FASTA file with an alignment of multiple gene samples. I am trying to develop a program that can count the number of mutations for each sample. What's the best way to do this? Store each gene sample in a dictionary and compare them somehow?

Upvotes: 1

Views: 2439

Answers (2)

swang
swang

Reputation: 219

try to read in FASTA file and store each sequence as string. You can certainly organize the sequences in a dictionary using text in the '<' line as key. If a gene is of the same length as a reference sequence without mutation, [i for i, a in enumerate(gene) if a != reference[i]] will return a list of position of mutations. its length will be the number of mutations. If mutation involves missing or added AA, it will be much more complicated.

Upvotes: 0

Chrismit
Chrismit

Reputation: 1518

If they are in an alignment format already, the identities and mismatches are already calculated. So you have something like this:

Aln1: ACTGGTTGTCCAACCGTAATCGAAG

Aln2: ---GGTTGTCCAATTC---TCGAAG

Capture each one into a string, and simply enumerate over them. Something simple like this works:

mutations=0
for i,j in zip(aln1,aln2):
    if i != j and i != '-' and j != '-':
        mutations+=1

It depends on your personal criteria though, if you want to include gaps as mutations, etc.

Upvotes: 1

Related Questions