Reputation: 2410
I have a list of numbers and a list of ranges. I would like to efficently check for every number if it is with any of the ranges in a resulting boolean array. Like this
a=[1,2,3,4,7,8,9]
b=[(0,3),(8,10)]
f(a,b)=>[True,True,True,False,False,True,True]
Upvotes: 1
Views: 76
Reputation: 53119
If the intervals are non overlapping np.searchsorted
should be rather efficient
(np.nextafter(b,b+np.arange(-1,1)).ravel().searchsorted(a)&1).astype(bool)
# array([ True, True, True, False, False, True, True])
Timings using @Divakar's benchit
:
Code for making the plot:
import benchit
import numpy as np
import pandas as pd
def pp(ab):
a,b=ab
return (np.nextafter(b,b+np.arange(-1,1)).ravel().searchsorted(a)&1) \
.astype(bool)
def dv(ab):
a,b=ab
L = max(np.max(a), max(max(b))+1)+1
mask = np.zeros(L, dtype=bool)
for (i,j) in b:
mask[i:j+1] = 1
return mask[a]
def ys(ab):
a,b=ab
[any(y in x for x in pd.IntervalIndex.from_tuples(b,closed='both')) for y in a ]
def cn(ab):
a,b=ab
return [
any(low <= i <= high for low, high in b)
for i in a
]
def make(n):
b = np.random.randint(1,11,(n//10*2)).cumsum().reshape(-1,2)
b = [(x,y) for x,y in b.tolist()]
a = np.random.randint(0,n,n//3).tolist()
return a,b
in_ = {n:make(n) for n in [10,20,50,100,200,500,1000]}
funcs = [pp,dv,ys,cn]
t = benchit.timings(funcs, in_)
t.rank()
t.plot(logx=True, save='timings.png')
Upvotes: 1
Reputation: 221754
Here's one with masking
-
L = max(np.max(a), max(max(b))+1)+1
mask = np.zeros(L, dtype=bool)
for (i,j) in b:
mask[i:j+1] = 1
out = mask[a]
Upvotes: 0
Reputation: 4564
Here's a pure python version. Obviously you can set <=
to be <
if you want exclusive ranges.
a = [1, 2, 3, 4, 7, 8, 9]
b = [(0, 3), (8, 10)]
result = [
any(low <= i <= high for low, high in b)
for i in a
]
# [True, True, True, False, False, True, True]
Upvotes: 2
Reputation: 323396
We can pass IntervalIndex
[any(y in x for x in pd.IntervalIndex.from_tuples(b,closed='both')) for y in a ]
Out[48]: [True, True, True, False, False, True, True]
Upvotes: 3