Stahlsocke
Stahlsocke

Reputation: 17

Indexing across multiple intervals

I am trying to extract the n-th element from a set of multiple intervals. I am currently dealing with genome sequences. Assume we have a gene with a gap in the middle. The position of this gene within the whole DNA is:

gene = [100,110], [130,140]
# representing the lists [100,101,...,109] and [130, 131,...,139]
# the gene spans over these entries of the DNA, so it looks like -gene-gap-gene-

Now, for a position within the gene (e.g. 10th position), I want to find the corresponding position on the whole DNA (which would be 109 in this example). The function should do the following:

function(gene, 9) 
> 109
function(gene, 10)
> 130 

My approach is to explicitly generate the two sequences, concatenate them and take the n-th element of this list. However, for large lists (as they happen to occur), this is very inefficient.

Can anyone think of a simple way?

Thanks in advance!

Upvotes: 0

Views: 28

Answers (2)

Błotosmętek
Błotosmętek

Reputation: 12927

A generic solution, should work for as many gaps in the gene as you want:

gene = [[100,110], [130,140]]

def function(gene, n):
    for span in gene:
        span_len = span[1] - span[0] 
        if n <= span_len:
            return n + span[0] - 1
        else:
            n -= span_len

print(function(gene,10))
print(function(gene,11))

Upvotes: 1

PYA
PYA

Reputation: 8636

your function can be provided both lists and you can find which list you should be indexing and where using the size of the lists

so if you do function(gene, 10) and function(gene, 11)

10 <= len(List1) but 11 > len(list1) so you know you need to access the second list in the case of 11, and the right element is 11 - len(list1) -1 which is index 0 but for the second list.

Upvotes: 0

Related Questions