Reputation: 113
I am writing a program in which I want to use 3 variables in my for
loop and every variable should run from a different start index. Here is a snippet from my code. I have a table named codons in which I have some value for every three letter for e.g 'atg':'F'
, 'ggg':'Q'
, 'ttg':'E'
etc
seq='atgggggggcccccc'
seqlen= len(seq)
aaseq1=[]
aaseq2=[]
aaseq3=[]
for i in range(0,seqlen,3):
codon1 = seq[i:i+3]
aa1 = codons[codon1]
aaseq1.append(aa1)
print ''.join(aaseq1)
In this code, I am running variable i
from position 0 but I want to use 2 more variables (j
and k
) which will run from 1 and 2 respectively and append the result in aaseq2
and aaseq3
list.
codon2 = seq[j:j+3]
codon3 = seq[k:k+3]
Upvotes: 2
Views: 1329
Reputation: 20344
There are various ways to do this and the best one to use depends on how big your seq
is in real life. The other answers give some nice ways to utilise Python features to avoid building lists explictly.
I'll give you one solution that goes through every group of three consecutive letters, but assigns them to one of three arrays depending on whether i%3
is 0
,1
or 2
.
For the example you've given - to me this is very easy to read but retains roughly the data structure you began with and I assume are familiar with. I've taken the liberty of adding a truncated dict
for codons
so that the code runs.
codons = {'atg':'Methionine','tgg':'Tryptophan','ggg':'Glycine',
'ggc':'Glycine','gcc':'Alanine','ccc':'Proline'}
seq='atgggggggcccccc'
seqlen= len(seq)
aaseq=[[],[],[]]
for i in range(seqlen-2):
codon = seq[i:i+3]
aa = codons[codon]
aaseq[i%3].append(aa)
print 'aaseq1 ='
print ''.join(aaseq[0])
print 'aaseq2 ='
print ''.join(aaseq[1])
print 'aaseq3 ='
print ''.join(aaseq[2])
This gives the output:
aaseq1 =
MethionineGlycineGlycineProlineProline
aaseq2 =
TryptophanGlycineGlycineProline
aaseq3 =
GlycineGlycineAlanineProline
If you want a more concise form - try this:
#Make every codon by zipping the sequence offset by one each time
codon_sequence = [''.join(z) for z in zip(seq,seq[1:],seq[2:])]
#Print every 3rd codon - starting at zero...
print 'aaseq1 = ',''.join([codons[c] for c in codon_sequence[::3]])
#...then starting at 1...
print 'aaseq2 = ',''.join([codons[c] for c in codon_sequence[1::3]])
#...you get the picture...
print 'aaseq3 = ',''.join([codons[c] for c in codon_sequence[2::3]])
Of course - rather than printing the sequences as the last step, you can assign them to variables if you need to do further processing.
Upvotes: 2
Reputation: 3745
(I usually) Try not to use loops in Python. While (as @PM-2ring points out in a comment) list expressions are not necessarily any faster than explicit loops, some find that they can write, understand, and debug a lot faster by letting python handle the details of iterating over data as much as possible.
Below are a few version of your program, ultimately "pythonified" down to four lines, just to see where it went.
Usually there are ways to let Python do things for you, using indexing and list expressions. These can be concise and powerful things, and many python functions just do what you would want them to. For example, zip just drops the dangling base pairs at the end without complaining.
The print statements are there just to see what's happening, of course delete them later.
seq='atgggggggcccccc'
s1 = seq[0::3] # you can drop the zero, it's just for readability
s2 = seq[1::3]
s3 = seq[2::3]
c1 = zip( s1, s2, s3 )
c2 = zip( s2, s3, s1[1:] ) # frame shift of 1
c3 = zip( s3, s1[1:], s2[1:] ) # frame shift of 2
print c1
print c2
print c3
co1 = [bp1+bp2+bp3 for (bp1, bp2, bp3) in c1]
co2 = [bp1+bp2+bp3 for (bp1, bp2, bp3) in c2]
co3 = [bp1+bp2+bp3 for (bp1, bp2, bp3) in c3]
print co1
print co2
print co3
aaseq1 = [codons(thing) for thing in c1]
aaseq2 = [codons(thing) for thing in c2]
aaseq3 = [codons(thing) for thing in c3]
...which could also be written like this:
s1, s2, s3 = [seq[i::3] for i in range(3)] # use list comprehension and unpacking
c1 = zip( s1, s2, s3 )
c2 = zip( s2, s3, s1[1:] ) # frame shift of 1
c3 = zip( s3, s1[1:], s2[1:] ) # frame shift of 2
co1, co2, co3 = [[tr[0]+tr[1]+tr[2] for tr in c] for c in [c1,c2,c3]]
aaseq1, asseq2, asseq3 = [[codons(trip) for trip in co] for co in [co1, co2, co3]]
That was just to advertise more python. For beginners it may be less readable.
This is a further pythonification (just to see where this goes...):
S = [seq[i::3] for i in range(3)] # three reading frames
C = zip(S[0], S[1], S[2]), zip(S[1], S[2], S[0][1:]), zip(S[2], S[0][1:], S[1][1:]) # group
CO = [[''.join(tr) for tr in c] for c in C] # tuples to triplet strings
AASEQs = [[codons(trip) for trip in co] for co in CO] # look up Amino Acids
and finally, if you want to change the three AA sequences into just three long strings:
final_AASEQs = [''.join(AASEQ) for AASEQ in AASEQs]
Just for fun, here is what the dictionary codons
might look like (from Wikipedia, note upper case for A, T, G, C bases. So in the question seq = 'ATGGGGGGGCCCCCC'
codons = {'CTT': 'Leu', 'ATG': 'Met', 'AAG': 'Lys', 'AAA': 'Lys', 'ATC': 'Ile',
'AAC': 'Asn', 'ATA': 'Ile', 'AGG': 'Arg', 'CCT': 'Pro', 'ACT': 'Thr',
'AGC': 'Ser', 'ACA': 'Thr', 'AGA': 'Arg', 'CAT': 'His', 'AAT': 'Asn',
'ATT': 'Ile', 'CTG': 'Leu', 'CTA': 'Leu', 'CTC': 'Leu', 'CAC': 'His',
'ACG': 'Thr', 'CAA': 'Gln', 'AGT': 'Ser', 'CAG': 'Gln', 'CCG': 'Pro',
'CCC': 'Pro', 'TAT': 'Tyr', 'GGT': 'Gly', 'TGT': 'Cys', 'CGA': 'Arg',
'CCA': 'Pro', 'CGC': 'Arg', 'GAT': 'Asp', 'CGG': 'Arg', 'TTT': 'Phe',
'TGC': 'Cys', 'GGG': 'Gly', 'TAG': 'STOP', 'GGA': 'Gly', 'TGG': 'Trp',
'GGC': 'Gly', 'TAC': 'Tyr', 'GAG': 'Glu', 'TCG': 'Ser', 'TTA': 'Leu',
'GAC': 'Asp', 'CGT': 'Arg', 'GAA': 'Glu', 'TCA': 'Ser', 'GCA': 'Ala',
'GTA': 'Val', 'GCC': 'Ala', 'GTC': 'Val', 'GCG': 'Ala', 'GTG': 'Val',
'TTC': 'Phe', 'GTT': 'Val', 'GCT': 'Ala', 'ACC': 'Thr', 'TGA': 'STOP',
'TTG': 'Leu', 'TCC': 'Ser', 'TAA': 'STOP', 'TCT': 'Ser'} # ATG also START
Upvotes: 2
Reputation: 35089
You seem to be looking for the built-in zip
function, which allows for lock-step iteration. Combined with tuple unpacking, it works like this:
>>> for i,j in zip(range(3), range(10,13)):
... print(i,j)
...
0 10
1 11
2 12
The arguments to zip
are anything you can normally use in that part of the for
loop, and you can have any number of them (as long as you assign to the same number of variables, or to only one variable which will be a tuple at each iteration).
Upvotes: 2