Reputation: 165
I'm not a python expert, and I ran into this snippet of code which actually works and produces the correct answer, but I'm not sure I understand what happens in the second line:
for i in range(len(motifs[0])):
best = ''.join([motifs[j][i] for j in range(len(motifs))])
profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT'])
I was trying to replace it with something like:
for i in range(len(motifs[0])):
for j in range(len(motifs)):
best =[motifs[j][i]]
profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT'])
and also tried to break down the last line like this:
for i in range(len(motifs[0])):
for j in range(len(motifs)):
best =[motifs[j][i]]
for base in 'ACGT':
profile.append(best.count(base)+1)/float(len(best)
I tried some more variations but non of them worked. My question is: What are those expressions (second and third line of first code) mean and how would you break it down to a few lines?
Thanks :)
Upvotes: 0
Views: 89
Reputation: 363757
''.join([motifs[j][i] for j in range(len(motifs))])
is idiomatically written
''.join(m[i] for m in motifs)
so it concatenates the i
'th entry of all motifs, in order. Similarly,
[(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT']
builds a list of (best.count(bseq)+1)/float(len(seq))
values for of ACGT
; since the base
variable doesn't actually occur, it's a list containing the same value four times and can be simplified to
[(best.count(bseq)+1) / float(len(seq))] * 4
Upvotes: 3
Reputation: 98078
for i in range(len(motifs[0])):
seq = ''.join([motifs[j][i] for j in range(len(motifs))])
profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT'])
is equivalent to:
for i in range(len(motifs[0])):
seq = ''
for j in range(len(motifs)):
seq += motifs[j][i]
profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT'])
which can be improved in countless ways.
For example:
seqs = [ ''.join(motif) for motif in motifs ]
bc = best.count(bseq)+1
profilte.extend([ map(lambda x: bc / float(len(x)),
seq) for base in 'ACGT' ] for seq in seqs)
correctness of which, I cannot test due to lack of input/output conditions.
Upvotes: 1
Reputation: 34718
Closest I got without being able to test it
for i, _ in enumerate(motifs[0]):
seq = ""
for m in motifs:
seq += m[i]
tmp = []
for base in "ACGT":
tmp.append(best.count(bseq) + 1 / float(len(seq)))
profile.append(tmp)
Upvotes: 1