Reputation: 2429
I have the following sequence:
seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]
Here is a dictionary key that stores the value of amino acid for each of the codons (Triplet bases like ATG, GCT
etc).
aminoacid = {'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCC' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','GGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G'}
As one can see several codons can code for the same aminoacid (eg. GGT,GGC,GGA, GGG etc all code for Glycine (G)
). These are Synonymous (PSyn) and if codons code for different amino acids they are Non-Synonymous (PNonsyn)
In this code, I need to do the following:
For each element in the list of lists, if there is a change in the bases AND they all code for the same amino acid, then increase count of PSyn by 1 and if it codes for different amino acids increment count PNonsyn by 1
Here,
ATG all code for M #However, all are ATG's no change in bases. So no increment in count
GAC, GAT for D; GAA for E; and CCT for P #Codes for three different amino acids, increment count by 1
GGT,GGC,GGA, GGG for G #Different bases but all code for same amino acids, increment count by 1
OutPut:
CountPsyn = 1
CountPNonsyn = 1
Generate a list of amino acids that corresponds to the above seq. such that:
Output : ['ATG','nonsyn','G'] #For sites with different aminoacids, the list should say nonsyn and for sites which had identical bases it should list the bases
I need help modifying the following code to get the program to work. I am not confident on how to call values from dictionary and check them against all the elements. Code Attempted:
countPsyn = 0
countPnonsyn = 0
listofaa =[]
for i in seq:
for base, value in enumerate(i):
if value[i] == value[i+1]: #eg. ['ATG','ATG','ATG','ATG']
listofaa.append(value)
if value[i] != value[i+1]:
if aminoacid[value][i] == aminoacid[value][i+1]: #eg.['GCC','GCG','GCA','GCT']
countPsyn =+ 1
listofaa.append(aminoacid)
else: #eg. ['GAC','GAT','GAA','CCT']
countPnonsyn =+ 1
listofaa.append('nonsyn')
File Output can be found [here][1] https://eval.in/669107
Upvotes: 3
Views: 117
Reputation: 1667
Here is my stab at the solution.
aminoacid = {'GCC': 'A' ,'TTT' : 'F','TTC' : 'F','TTA' : 'L','TTG' : 'L','CTT' : 'L','CTC' : 'L','CTA' : 'L','CTG' : 'L','ATT' : 'I','ATC' : 'I','ATA' : 'I','ATG' : 'M','GTT' : 'V','GTC' : 'V','GTA' : 'V','GTG' : 'V','TCT' : 'S','TCC' : 'S','TCA' : 'S','TCG' : 'S','CCT' : 'P','CCC' : 'P','CCA' : 'P','CCG' : 'P','ACT' : 'T','ACC' : 'T','ACA' : 'T','ACG' : 'T','GCT' : 'A','GCG' : 'A','GCA' : 'A','GCG' : 'A','TAT' : 'Y','TAC' : 'Y','TAA' : 'STOP','TAG' : 'STOP','CAT' : 'H','CAC' : 'H','CAA' : 'Q','CAG' : 'Q','AAT' : 'N','AAC' : 'N','AAA' : 'K','AAG' : 'K','GAT' : 'D','GAC' : 'D','GAA' : 'E','GAG' : 'E','TGT' : 'C','TGC' : 'C','TGA' : 'STOP','TGG' : 'W','CGT' : 'R','CGC' : 'R','CGA' : 'R','CGG' : 'R','AGT' : 'S','AGC' : 'S','AGA' : 'R','AGC' : 'R','CGT' : 'G','GGC' : 'G','GGA' : 'G','GGG' : 'G',}
seq = [['ATG','ATG','ATG','ATG'],['GAC','GAT','GAA','CCT'],['GCC','GCG','GCA','GCT']]
Psyn = 0;
PNonsyn = 0;
output = [];
#loop through each list in your list of list
for sublist in seq:
acids = [aminoacid[base] for base in sublist]
if len(set(acids)) != 1: #if there are different amino acids, then nonsync
output.append('nonsync')
PNonsyn += 1
else: #if same amino acid
if len(set(sublist)) == 1: #if same base
output.append(sublist[0]);
else: #if not same base
output.append(acids[0]);
Psyn += 1
print "Psyn = "+ str(Psyn)
print "PNonsyn = "+ str(PNonsyn)
print output
Admittedly it's not a modification of your code, but there is a neat trick here to void the double for
loop. Given a list mylist
, you could find all uniques elements in a list by calling set(mylist)
. E.g.
>>> a = ['AGT','AGT','ACG']
>>> set(a)
set(['AGT', 'ACG'])
>>> len(set(a))
2
Upvotes: 1