Reputation: 609
I am looking for a efficient and fast way to do the following in Python 3.x. I am open to using third party libraries such as Numpy as long as the performance is there.
I have a list of ranges containing hundreds of thousands of entries. They're not actually range()'s, but rather the boundary numbers, such as:
list_a = [(1, 100), (300, 550), (551, 1999)]
Then, I iterate over hundred of thousands of other ranges (boundary numbers). I want to find if they contain one of the existing ranges from above. For example:
(0, 600) contains list_a[0] and list_a[1]
(550, 2000) contains list_a[2]
(2000, 2200) does not contain an existing range
Right now, doing something similar to the following, which is way too slow for big amounts of data:
for start, end in get_next_range():
for r in list_a:
if r[0] >= start and r[1] <= end:
# do something
else:
# do something else
Any help would be much appreciated!
Upvotes: 6
Views: 1426
Reputation: 4813
Assuming they are sorted within, i.e. the range values are never (high, low), this will compare all elements in a to all elements in b at the same time:
import numpy as np
list_a = [(1, 100), (300, 550), (551, 1999)]
list_b = [(0, 600), (550, 2000), (2000, 2200), (50, 70)]
a = np.array(a)
b = np.array(b)
comparison = np.logical_and(a[:, 1] >= b[:, 1, None], a[:, 0] <= b[:, 0, None])
idx_a, idx_b = idx = np.nonzero(comparison)
print(a[idx_a])
print(b[idx_b])
array([[ 1, 100],
[ 300, 550],
[ 551, 1999]])
array([[ 0, 600],
[ 0, 600],
[ 550, 2000]])
This gives you the intervals in a that are contained in b. The indices are given in idx_a
and idx_b
.
Upvotes: 0
Reputation: 36838
I would do it following way using numpy
:
import numpy as np
start = 0
finish = 600
lista = np.array([[1,100],[300,550],[551,1999]])
S = lista[:,0]>start
F = lista[:,1]<finish
contains = np.logical_and(S,F)
ind = list(np.flatnonzero(contains))
print(ind) #print [0, 1]
Explanation: firstly I made lista
as np.array
, then sliced it into two parts: one with lower bound ([:,0]
) and second for upper bound ([:,1]
) then used comparison operators, getting 1D np.array
s of bool
s. Using np.logical_an
d I got single 1D np.array
with True
s for fullfiling condition and False
s for rest. Finally I used np.flatnonzero
to get indices of True
s. This solution assumes that all data are in (lowerboundary,upperboundary)
order. Please check if that solution is fast enough for your purpose.
Upvotes: 2