FairyDuster
FairyDuster

Reputation: 165

How to break down an expression into for-loops

I'm not a python expert, and I ran into this snippet of code which actually works and produces the correct answer, but I'm not sure I understand what happens in the second line:

for i in range(len(motifs[0])):
            best = ''.join([motifs[j][i] for j in range(len(motifs))])
            profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT']) 

I was trying to replace it with something like:

for i in range(len(motifs[0])):
    for j in range(len(motifs)):
        best =[motifs[j][i]]
    profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT']) 

and also tried to break down the last line like this:

for i in range(len(motifs[0])):
    for j in range(len(motifs)):
        best =[motifs[j][i]]
    for base in 'ACGT':
        profile.append(best.count(base)+1)/float(len(best)

I tried some more variations but non of them worked. My question is: What are those expressions (second and third line of first code) mean and how would you break it down to a few lines?

Thanks :)

Upvotes: 0

Views: 89

Answers (3)

Fred Foo
Fred Foo

Reputation: 363757

''.join([motifs[j][i] for j in range(len(motifs))])

is idiomatically written

''.join(m[i] for m in motifs)

so it concatenates the i'th entry of all motifs, in order. Similarly,

[(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT']

builds a list of (best.count(bseq)+1)/float(len(seq)) values for of ACGT; since the base variable doesn't actually occur, it's a list containing the same value four times and can be simplified to

[(best.count(bseq)+1) / float(len(seq))] * 4

Upvotes: 3

perreal
perreal

Reputation: 98078

for i in range(len(motifs[0])):
            seq = ''.join([motifs[j][i] for j in range(len(motifs))])
            profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT']) 

is equivalent to:

for i in range(len(motifs[0])):
    seq = '' 
    for j in range(len(motifs)):
            seq += motifs[j][i] 
    profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT']) 

which can be improved in countless ways.

For example:

seqs = [ ''.join(motif) for motif in motifs ]
bc   = best.count(bseq)+1
profilte.extend([ map(lambda x: bc / float(len(x)), 
    seq) for base in 'ACGT' ] for seq in seqs)

correctness of which, I cannot test due to lack of input/output conditions.

Upvotes: 1

Jakob Bowyer
Jakob Bowyer

Reputation: 34718

Closest I got without being able to test it

for i, _ in enumerate(motifs[0]):
    seq = ""
    for m in motifs:
        seq += m[i]

    tmp = []
    for base in "ACGT":
        tmp.append(best.count(bseq) + 1 / float(len(seq)))
    profile.append(tmp)

Upvotes: 1

Related Questions