mohammad shahbaz Khan
mohammad shahbaz Khan

Reputation: 113

Using a for loop in python with more than 1 variable

I am writing a program in which I want to use 3 variables in my for loop and every variable should run from a different start index. Here is a snippet from my code. I have a table named codons in which I have some value for every three letter for e.g 'atg':'F', 'ggg':'Q' , 'ttg':'E' etc

seq='atgggggggcccccc'
seqlen= len(seq)
aaseq1=[]
aaseq2=[]
aaseq3=[]
for i in range(0,seqlen,3):
     codon1 = seq[i:i+3]
     aa1 = codons[codon1]
     aaseq1.append(aa1)  
print ''.join(aaseq1)  

In this code, I am running variable i from position 0 but I want to use 2 more variables (j and k) which will run from 1 and 2 respectively and append the result in aaseq2 and aaseq3 list.

 codon2 = seq[j:j+3]
 codon3 = seq[k:k+3]

Upvotes: 2

Views: 1329

Answers (3)

J Richard Snape
J Richard Snape

Reputation: 20344

There are various ways to do this and the best one to use depends on how big your seq is in real life. The other answers give some nice ways to utilise Python features to avoid building lists explictly.

I'll give you one solution that goes through every group of three consecutive letters, but assigns them to one of three arrays depending on whether i%3 is 0,1 or 2.

For the example you've given - to me this is very easy to read but retains roughly the data structure you began with and I assume are familiar with. I've taken the liberty of adding a truncated dict for codons so that the code runs.

codons = {'atg':'Methionine','tgg':'Tryptophan','ggg':'Glycine',
          'ggc':'Glycine','gcc':'Alanine','ccc':'Proline'}

seq='atgggggggcccccc'
seqlen= len(seq)
aaseq=[[],[],[]]

for i in range(seqlen-2):
     codon = seq[i:i+3]    
     aa = codons[codon]
     aaseq[i%3].append(aa)  

print 'aaseq1 ='
print ''.join(aaseq[0])  
print 'aaseq2 ='
print ''.join(aaseq[1])  
print 'aaseq3 ='
print ''.join(aaseq[2])  

This gives the output:

aaseq1 =
MethionineGlycineGlycineProlineProline
aaseq2 =
TryptophanGlycineGlycineProline
aaseq3 =
GlycineGlycineAlanineProline

If you want a more concise form - try this:

#Make every codon by zipping the sequence offset by one each time
codon_sequence = [''.join(z) for z in zip(seq,seq[1:],seq[2:])]
#Print every 3rd codon - starting at zero...
print 'aaseq1 = ',''.join([codons[c] for c in codon_sequence[::3]])
#...then starting at 1...
print 'aaseq2 = ',''.join([codons[c] for c in codon_sequence[1::3]])
#...you get the picture...
print 'aaseq3 = ',''.join([codons[c] for c in codon_sequence[2::3]])

Of course - rather than printing the sequences as the last step, you can assign them to variables if you need to do further processing.

Upvotes: 2

uhoh
uhoh

Reputation: 3745

(I usually) Try not to use loops in Python. While (as @PM-2ring points out in a comment) list expressions are not necessarily any faster than explicit loops, some find that they can write, understand, and debug a lot faster by letting python handle the details of iterating over data as much as possible.

Below are a few version of your program, ultimately "pythonified" down to four lines, just to see where it went.

Usually there are ways to let Python do things for you, using indexing and list expressions. These can be concise and powerful things, and many python functions just do what you would want them to. For example, zip just drops the dangling base pairs at the end without complaining.

The print statements are there just to see what's happening, of course delete them later.

seq='atgggggggcccccc'

s1 = seq[0::3]  # you can drop the zero, it's just for readability
s2 = seq[1::3]
s3 = seq[2::3]

c1 = zip( s1,    s2,     s3     )
c2 = zip( s2,    s3,     s1[1:] )  # frame shift of 1
c3 = zip( s3,    s1[1:], s2[1:] )  # frame shift of 2

print c1
print c2
print c3

co1 = [bp1+bp2+bp3 for (bp1, bp2, bp3) in c1]
co2 = [bp1+bp2+bp3 for (bp1, bp2, bp3) in c2]
co3 = [bp1+bp2+bp3 for (bp1, bp2, bp3) in c3]

print co1
print co2
print co3

aaseq1 = [codons(thing) for thing in c1]
aaseq2 = [codons(thing) for thing in c2]
aaseq3 = [codons(thing) for thing in c3]

...which could also be written like this:

s1, s2, s3 = [seq[i::3] for i in range(3)]   # use list comprehension and unpacking

c1 = zip( s1,    s2,     s3     )
c2 = zip( s2,    s3,     s1[1:] )  # frame shift of 1
c3 = zip( s3,    s1[1:], s2[1:] )  # frame shift of 2

co1, co2, co3 = [[tr[0]+tr[1]+tr[2] for tr in c] for c in [c1,c2,c3]]

aaseq1, asseq2, asseq3 = [[codons(trip) for trip in co] for co in [co1, co2, co3]]

That was just to advertise more python. For beginners it may be less readable.

This is a further pythonification (just to see where this goes...):

S = [seq[i::3] for i in range(3)]   # three reading frames

C = zip(S[0], S[1], S[2]), zip(S[1], S[2], S[0][1:]), zip(S[2], S[0][1:], S[1][1:]) # group

CO = [[''.join(tr) for tr in c] for c in C]    # tuples to triplet strings

AASEQs = [[codons(trip) for trip in co] for co in CO]  # look up Amino Acids

and finally, if you want to change the three AA sequences into just three long strings:

final_AASEQs = [''.join(AASEQ) for AASEQ in AASEQs]

Just for fun, here is what the dictionary codons might look like (from Wikipedia, note upper case for A, T, G, C bases. So in the question seq = 'ATGGGGGGGCCCCCC'

codons = {'CTT': 'Leu', 'ATG': 'Met', 'AAG': 'Lys', 'AAA': 'Lys', 'ATC': 'Ile',
          'AAC': 'Asn', 'ATA': 'Ile', 'AGG': 'Arg', 'CCT': 'Pro', 'ACT': 'Thr',
          'AGC': 'Ser', 'ACA': 'Thr', 'AGA': 'Arg', 'CAT': 'His', 'AAT': 'Asn',
          'ATT': 'Ile', 'CTG': 'Leu', 'CTA': 'Leu', 'CTC': 'Leu', 'CAC': 'His',
          'ACG': 'Thr', 'CAA': 'Gln', 'AGT': 'Ser', 'CAG': 'Gln', 'CCG': 'Pro',
          'CCC': 'Pro', 'TAT': 'Tyr', 'GGT': 'Gly', 'TGT': 'Cys', 'CGA': 'Arg',
          'CCA': 'Pro', 'CGC': 'Arg', 'GAT': 'Asp', 'CGG': 'Arg', 'TTT': 'Phe',
          'TGC': 'Cys', 'GGG': 'Gly', 'TAG': 'STOP', 'GGA': 'Gly', 'TGG': 'Trp',
          'GGC': 'Gly', 'TAC': 'Tyr', 'GAG': 'Glu', 'TCG': 'Ser', 'TTA': 'Leu',
          'GAC': 'Asp', 'CGT': 'Arg', 'GAA': 'Glu', 'TCA': 'Ser', 'GCA': 'Ala',
          'GTA': 'Val', 'GCC': 'Ala', 'GTC': 'Val', 'GCG': 'Ala', 'GTG': 'Val',
          'TTC': 'Phe', 'GTT': 'Val', 'GCT': 'Ala', 'ACC': 'Thr', 'TGA': 'STOP',
          'TTG': 'Leu', 'TCC': 'Ser', 'TAA': 'STOP', 'TCT': 'Ser'}  # ATG also START

Upvotes: 2

lvc
lvc

Reputation: 35089

You seem to be looking for the built-in zip function, which allows for lock-step iteration. Combined with tuple unpacking, it works like this:

>>> for i,j in zip(range(3), range(10,13)):
...   print(i,j)
... 
0 10
1 11
2 12

The arguments to zip are anything you can normally use in that part of the for loop, and you can have any number of them (as long as you assign to the same number of variables, or to only one variable which will be a tuple at each iteration).

Upvotes: 2

Related Questions