Reputation: 1539
The following code works exactly as intended:
dnasequences = [
'GCTAGCTAGCTAGCTA',
'CTAGCTAGCTAGCTAG',
'TAGCTAGCTAGCTAGC',
'AGCTAGCTAGCTAGCT'
]
xlate = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
def dna2rna(sequences):
rnalist = [xlate[n] for sequence in sequences for n in sequence]
return rnalist
rnasequences = dna2rna(dnasequences)
print([''.join(rnasequences[i:i+16]) for i in range(0, len(rnasequences), 16)])
It returns: ['CGAUCGAUCGAUCGAU', 'GAUCGAUCGAUCGAUC', 'AUCGAUCGAUCGAUCG', 'UCGAUCGAUCGAUCGA']
I am trying to modify it so that the DNA sequences in dnasequences() can be of any variable length.
I am close with this:
dnasequences = [
'GCTAGCTA',
'CTAGCTAGCTAGCTAG',
'TAGCTAGCTAGCTAGC',
'AGCTAGCTAGCTAGCT'
]
xlate = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
def dna2rna(sequences):
rnalist = [xlate[n] for sequence in sequences for n in sequence]
seqlen = [len(sequence) for sequence in sequences]
return rnalist, seqlen
def printxlate(rnasq, lens):
index = 0
for i in range(0, len(rnasq), lens[index]):
print([''.join(rnasq[i:i+lens[index]])])
index += 1
rnasequences, seqlens = dna2rna(dnasequences)
printxlate(rnasequences, seqlens)
It prints the first two translated sequences correctly, but starting with the third it is off (although I do have a second issue: in the second program version I am getting a separate list for each sequence in dnasequences(), which I do not want. I want a single list with four elements like in the first version.)
On the first iteration i = 0. On the second iteration i = 8. So far so good.
But on the third iteration (in the PyCharm debugger) I see that i = 16. I believe it should be 24. Since it isn't, the third and fourth translations are wrong and it errors out with an 'index out of range' error.
If the third iteration was i = 24 and the fourth i = 40 it would work.
I just don't see why it gets the first two iterations correct and then begins failing on the third.
In the first program 'i' steps through 0, 16, 32, and 48 just fine.
Upvotes: 0
Views: 266
Reputation: 1539
Fully functioning corrected second version including a one-to-many translation. Refined: 7-5-2017
from pprint import pprint
dnasequences = [
'GCTAGCTA',
'CTAGCTAGCTAGCTAG',
'TAGCTAGCTAGC',
'AGCTAGCTAGCTAGCTAGCT',
'GCTA',
'CTAGTAGCTGACTCAGTACGTACA'
]
xlate = {'G': 'abc', 'C': 'G', 'T': 'A', 'A': 'U'}
pprint([''.join(xlate[n] for n in sequence) for sequence in dnasequences])
Output:
['abcGAUabcGAU',
'GAUabcGAUabcGAUabcGAUabc',
'AUabcGAUabcGAUabcG',
'UabcGAUabcGAUabcGAUabcGAUabcGA',
'abcGAU',
'GAUabcAUabcGAabcUGAGUabcAUGabcAUGU']
Upvotes: 0
Reputation: 214957
The reason you get into trouble is you are flattening the result by using a nested for in list comprehension. You should not have to worry about the problem if you use:
[[... for _ in string] for string in sequence]
# ^^^^ put the inner for loop here instead of at the end
A variation of your second solution would be:
[''.join(xlate[l] for l in s) for s in dnasequences]
# ['CGAUCGAU', 'GAUCGAUCGAUCGAUC', 'AUCGAUCGAUCGAUCG', 'UCGAUCGAUCGAUCGA']
str.translate
should be a better alternative here:
table = str.maketrans(xlate)
[s.translate(table) for s in dnasequences]
# ['CGAUCGAU', 'GAUCGAUCGAUCGAUC', 'AUCGAUCGAUCGAUCG', 'UCGAUCGAUCGAUCGA']
Upvotes: 2