jrm
jrm

Reputation: 609

Python, find if a range contains another smaller range from a list of ranges

I am looking for a efficient and fast way to do the following in Python 3.x. I am open to using third party libraries such as Numpy as long as the performance is there.

I have a list of ranges containing hundreds of thousands of entries. They're not actually range()'s, but rather the boundary numbers, such as:

list_a = [(1, 100), (300, 550), (551, 1999)]

Then, I iterate over hundred of thousands of other ranges (boundary numbers). I want to find if they contain one of the existing ranges from above. For example:

(0, 600) contains list_a[0] and list_a[1]
(550, 2000) contains list_a[2]
(2000, 2200) does not contain an existing range

Right now, doing something similar to the following, which is way too slow for big amounts of data:

for start, end in get_next_range():
    for r in list_a:
        if r[0] >= start and r[1] <= end:
            # do something
        else:
            # do something else

Any help would be much appreciated!

Upvotes: 6

Views: 1426

Answers (2)

RunOrVeith
RunOrVeith

Reputation: 4813

Assuming they are sorted within, i.e. the range values are never (high, low), this will compare all elements in a to all elements in b at the same time:

import numpy as np

list_a = [(1, 100), (300, 550), (551, 1999)]
list_b = [(0, 600), (550, 2000), (2000, 2200), (50, 70)]
a = np.array(a)
b = np.array(b)
comparison = np.logical_and(a[:, 1] >= b[:, 1, None], a[:, 0] <= b[:, 0, None])
idx_a, idx_b = idx = np.nonzero(comparison)
print(a[idx_a])
print(b[idx_b])

array([[   1,  100],
       [ 300,  550],
       [ 551, 1999]])

array([[   0,  600],
       [   0,  600],
       [ 550, 2000]])

This gives you the intervals in a that are contained in b. The indices are given in idx_a and idx_b.

Upvotes: 0

Daweo
Daweo

Reputation: 36838

I would do it following way using numpy:

import numpy as np
start = 0
finish = 600
lista = np.array([[1,100],[300,550],[551,1999]])
S = lista[:,0]>start
F = lista[:,1]<finish
contains = np.logical_and(S,F)
ind = list(np.flatnonzero(contains))
print(ind) #print [0, 1]

Explanation: firstly I made lista as np.array, then sliced it into two parts: one with lower bound ([:,0]) and second for upper bound ([:,1]) then used comparison operators, getting 1D np.arrays of bools. Using np.logical_and I got single 1D np.array with Trues for fullfiling condition and Falses for rest. Finally I used np.flatnonzero to get indices of Trues. This solution assumes that all data are in (lowerboundary,upperboundary) order. Please check if that solution is fast enough for your purpose.

Upvotes: 2

Related Questions