Reputation: 3
I need to slice list of nucleotide sequences eg. ["ATGCTGACTGCA", "ATGCAGGCGTAG"] according to two other list, one with the start codon and one with the stop codon.
I have all my data in a pandas dataframe, and extracted it into a np array for the sequence and two lists for the start and stop. I've tried with list comprehension:
seq = ["ATGCTGACTGCA", "ATGCAGGCGTAG"]
start = [1, 4]
stop = [6, 12]
[sublist[x:y] for x in start for y in stop for sublist in seq]
I tought this would associate the start and end of the sequence I needed and slice the sequence, but the result is a combination (the new list has 8 entries) What am i doing wrong?
Upvotes: 0
Views: 34
Reputation: 384
I think you would need one of the following two cases:
A nested loop with a zip, in case you want to get the two sublists for each sequence:
[sublist[x:y] for x,y in zip(start,stop) for sublist in seq]
Having the following result:
['TGCTG', 'TGCAG', 'TGACTGCA', 'AGGCGTAG']
Or just a zip with all:
[sublist[x:y] for x,y,sublist in zip(start,stop,seq)]
Getting the following result:
['TGCTG', 'AGGCGTAG']
Upvotes: 1
Reputation: 33107
I think you want a zip
instead of a nested loop.
>>> [s[x:y] for x, y in zip(start, stop) for s in seq]
['TGCTG', 'TGCAG', 'TGACTGCA', 'AGGCGTAG']
Upvotes: 0