MarkS
MarkS

Reputation: 1539

range() with variable step integer value

The following code works exactly as intended:

dnasequences = [
    'GCTAGCTAGCTAGCTA',
    'CTAGCTAGCTAGCTAG',
    'TAGCTAGCTAGCTAGC',
    'AGCTAGCTAGCTAGCT'
]

xlate = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}


def dna2rna(sequences):
    rnalist = [xlate[n] for sequence in sequences for n in sequence]
    return rnalist

rnasequences = dna2rna(dnasequences)
print([''.join(rnasequences[i:i+16]) for i in range(0, len(rnasequences), 16)])

It returns: ['CGAUCGAUCGAUCGAU', 'GAUCGAUCGAUCGAUC', 'AUCGAUCGAUCGAUCG', 'UCGAUCGAUCGAUCGA']

I am trying to modify it so that the DNA sequences in dnasequences() can be of any variable length.

I am close with this:

dnasequences = [
    'GCTAGCTA',
    'CTAGCTAGCTAGCTAG',
    'TAGCTAGCTAGCTAGC',
    'AGCTAGCTAGCTAGCT'
]

xlate = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}


def dna2rna(sequences):
    rnalist = [xlate[n] for sequence in sequences for n in sequence]
    seqlen = [len(sequence) for sequence in sequences]
    return rnalist, seqlen


def printxlate(rnasq, lens):
    index = 0
    for i in range(0, len(rnasq), lens[index]):
        print([''.join(rnasq[i:i+lens[index]])])
        index += 1


rnasequences, seqlens = dna2rna(dnasequences)
printxlate(rnasequences, seqlens)

It prints the first two translated sequences correctly, but starting with the third it is off (although I do have a second issue: in the second program version I am getting a separate list for each sequence in dnasequences(), which I do not want. I want a single list with four elements like in the first version.)

On the first iteration i = 0. On the second iteration i = 8. So far so good.

But on the third iteration (in the PyCharm debugger) I see that i = 16. I believe it should be 24. Since it isn't, the third and fourth translations are wrong and it errors out with an 'index out of range' error.

If the third iteration was i = 24 and the fourth i = 40 it would work.

I just don't see why it gets the first two iterations correct and then begins failing on the third.

In the first program 'i' steps through 0, 16, 32, and 48 just fine.

Upvotes: 0

Views: 266

Answers (2)

MarkS
MarkS

Reputation: 1539

Fully functioning corrected second version including a one-to-many translation. Refined: 7-5-2017

from pprint import pprint

dnasequences = [
    'GCTAGCTA',
    'CTAGCTAGCTAGCTAG',
    'TAGCTAGCTAGC',
    'AGCTAGCTAGCTAGCTAGCT',
    'GCTA',
    'CTAGTAGCTGACTCAGTACGTACA'
]

xlate = {'G': 'abc', 'C': 'G', 'T': 'A', 'A': 'U'}

pprint([''.join(xlate[n] for n in sequence) for sequence in dnasequences])

Output: ['abcGAUabcGAU', 'GAUabcGAUabcGAUabcGAUabc', 'AUabcGAUabcGAUabcG', 'UabcGAUabcGAUabcGAUabcGAUabcGA', 'abcGAU', 'GAUabcAUabcGAabcUGAGUabcAUGabcAUGU']

Upvotes: 0

akuiper
akuiper

Reputation: 214957

The reason you get into trouble is you are flattening the result by using a nested for in list comprehension. You should not have to worry about the problem if you use:

[[... for _ in string] for string in sequence]
# ^^^^ put the inner for loop here instead of at the end

A variation of your second solution would be:

[''.join(xlate[l] for l in s) for s in dnasequences]
# ['CGAUCGAU', 'GAUCGAUCGAUCGAUC', 'AUCGAUCGAUCGAUCG', 'UCGAUCGAUCGAUCGA']

str.translate should be a better alternative here:

table = str.maketrans(xlate)
[s.translate(table) for s in dnasequences]
# ['CGAUCGAU', 'GAUCGAUCGAUCGAUC', 'AUCGAUCGAUCGAUCG', 'UCGAUCGAUCGAUCGA']

Upvotes: 2

Related Questions