Reputation: 1135
I want to generate a mask from the results of numpy.searchsorted():
import numpy as np
# generate test examples
x = np.random.rand(1000000)
y = np.random.rand(200)
# sort x
idx = np.argsort(x)
sorted_x = np.take_along_axis(x, idx, axis=-1)
# searchsort y in x
pt = np.searchsorted(sorted_x, y)
pt
is an array. Then I want to create a boolean mask of size (200, 1000000)
with True values when its indices are idx[0:pt[i]]
, and I come up with a for-loop like this:
mask = np.zeros((200, 1000000), dtype='bool')
for i in range(200):
mask[i, idx[0:pt[i]]] = True
Anyone has an idea to speed up the for-loop?
Upvotes: 7
Views: 1593
Reputation: 221574
Going by the new-found information picked up off OP's comments
that states only y
is changing in real-time, we can pre-process lots of stuffs around x
and hence do much better. We will create a hashing array that will store stepped masks. For the part that involves y
, we will simply index into the hashing array with the indices obtained off searchsorted
which will approximate the final mask array. A final step of assigning the remaining bools could be offloaded to numba given its ragged nature. This should also be beneficial if we decide to scale up the lengths of y
.
Let's look at the implementation.
Pre-processing with x
:
sidx = x.argsort()
ssidx = x.argsort().argsort()
# Choose a scale factor.
# 1. A small one would store more mapping info, hence faster but occupy more mem
# 2. A big one would store less mapping info, hence slower, but memory efficient.
scale_factor = 100
mapar = np.arange(0,len(x),scale_factor)[:,None] > ssidx
Remaining steps with y
:
import numba as nb
@nb.njit(parallel=True,fastmath=True)
def array_masking3(out, starts, idx, sidx):
N = len(out)
for i in nb.prange(N):
for j in nb.prange(starts[i], idx[i]):
out[i,sidx[j]] = True
return out
idx = np.searchsorted(x,y,sorter=sidx)
s0 = idx//scale_factor
starts = s0*scale_factor
out = mapar[s0]
out = array_masking3(out, starts, idx, sidx)
Benchmarking
In [2]: x = np.random.rand(1000000)
...: y = np.random.rand(200)
In [3]: ## Pre-processing step with "x"
...: sidx = x.argsort()
...: ssidx = x.argsort().argsort()
...: scale_factor = 100
...: mapar = np.arange(0,len(x),scale_factor)[:,None] > ssidx
In [4]: %%timeit
...: idx = np.searchsorted(x,y,sorter=sidx)
...: s0 = idx//scale_factor
...: starts = s0*scale_factor
...: out = mapar[s0]
...: out = array_masking3(out, starts, idx, sidx)
41 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# A 1/10th smaller hashing array has similar timings
In [7]: scale_factor = 1000
...: mapar = np.arange(0,len(x),scale_factor)[:,None] > ssidx
In [8]: %%timeit
...: idx = np.searchsorted(x,y,sorter=sidx)
...: s0 = idx//scale_factor
...: starts = s0*scale_factor
...: out = mapar[s0]
...: out = array_masking3(out, starts, idx, sidx)
40.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# @silgon's soln
In [5]: %timeit x[np.newaxis,:] < y[:,np.newaxis]
138 ms ± 896 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
This borrowed a good part from OP's solution
.
import numba as nb
@nb.njit(parallel=True)
def array_masking2(mask1D, mask_out, idx, pt):
n = len(idx)
for j in nb.prange(len(pt)):
if mask1D[j]:
for i in nb.prange(pt[j],n):
mask_out[j, idx[i]] = False
else:
for i in nb.prange(pt[j]):
mask_out[j, idx[i]] = True
return mask_out
def app2(idx, pt):
m,n = len(pt), len(idx)
mask1 = pt>len(x)//2
mask2 = np.broadcast_to(mask1[:,None], (m,n)).copy()
return array_masking2(mask1, mask2, idx, pt)
So, the idea is once, we have larger than half of indices to be set True
, we switch over to set False
instead after pre-assigning those rows as all True
. This results in lesser memory accesses and hence some noticeable performance boost.
OP's solution :
@nb.njit(parallel=True,fastmath=True)
def array_masking(mask, idx, pt):
for j in nb.prange(pt.shape[0]):
for i in nb.prange(pt[j]):
mask[j, idx[i]] = True
return mask
def app1(idx, pt):
m,n = len(pt), len(idx)
mask = np.zeros((m, n), dtype='bool')
return array_masking(mask, idx, pt)
Timings -
In [5]: np.random.seed(0)
...: x = np.random.rand(1000000)
...: y = np.random.rand(200)
In [6]: %timeit app1(idx, pt)
264 ms ± 8.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit app2(idx, pt)
165 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 3
Reputation: 7211
This is an alternate answer but not sure that it's exactly what you need.
x = np.random.rand(1000000)
y = np.random.rand(200)
mask = x[np.newaxis,:] < y[:,np.newaxis]
Note: I mentioned that maybe it's not what you need because you specify the need of numpy.searchsorted()
and here I don't use it, however I get the same result. It might also be useful to somebody else in some future if it doesn't completely fits your needs ;).
Timings (@DanielF edit)
Setup:
import numpy as np
# generate test examples
x = np.random.rand(1000000)
y = np.random.rand(200)
# sort x
idx = np.argsort(x)
sorted_x = np.take_along_axis(x, idx, axis=-1)
Running:
%%timeit # silgon
mask = x[np.newaxis,:] < y[:,np.newaxis]
166 ms ± 3.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit # Divakar
pt = np.searchsorted(sorted_x, y)
mask = app2(idx, pt)
316 ms ± 29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit # f.c.
pt = np.searchsorted(sorted_x, y)
mask = np.zeros((200, 1000000), dtype='bool')
for i in range(200):
mask[i, idx[0:pt[i]]] = True
466 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 2
Reputation: 5949
For a better clarity, you're looking for a fast way to determine mask of indices of items of x
that are smaller than y[i]
. For example, if indices that sorts items of x
are:
np.argsort(x) = [5, 0, 2, 10, 7, 8, 9, 11, 1, 6, 3, 4]
and you know that 8
items are smaller than y[i]
, you'll need to choose first 8 items from inverse order of that list then:
arg_inv = [1, 8, 2, 10, 11, 0, 9, 4, 5, 6, 3, 7]
The easiest approach of this problem is advanced indexing:
length_x, length_y = len(x), len(y)
idx = np.argsort(x)
arg_inv = np.argsort(idx)
pt = np.searchsorted(x, y, sorter=idx)
mask = np.zeros((length_y, length_x), dtype='bool')
row, col = np.divmod(np.arange(length_x * length_y), length_x)
mask[row, col] = arg_inv[col] < pt[row]
return mask
I also add an example of small samples:
x = [0.809 0.958 0.881 0.146 0.882 0.421 0.604]
y = [0.119 0.981 0.775 0.254]
np.sort(x) = [0.146 0.421 0.604 0.809 0.881 0.882 0.958]
np.argsort(x) = [3 5 6 0 2 4 1]
arg_inv = [3 6 4 0 5 1 2]
pt = [0 7 3 1]
Process of advanced indexing:
row col arg_inv[col] pt[row] arg_inv[col] < pt[row]
0 0 0 3 0 0
1 0 1 6 0 0
2 0 2 4 0 0
3 0 3 0 0 0
4 0 4 5 0 0
5 0 5 1 0 0
6 0 6 2 0 0
7 1 0 3 7 1
8 1 1 6 7 1
9 1 2 4 7 1
10 1 3 0 7 1
11 1 4 5 7 1
12 1 5 1 7 1
13 1 6 2 7 1
14 2 0 3 3 0
15 2 1 6 3 0
16 2 2 4 3 0
17 2 3 0 3 1
18 2 4 5 3 0
19 2 5 1 3 1
20 2 6 2 3 1
21 3 0 3 1 0
22 3 1 6 1 0
23 3 2 4 1 0
24 3 3 0 1 1
25 3 4 5 1 0
26 3 5 1 1 0
27 3 6 2 1 0
Upvotes: 1
Reputation: 1135
I have implemented a numba version of the for-loop, and it is slightly faster than the python version. Here is the code:
import numba as nb
@nb.njit(parallel=True,fastmath=True)
def array_masking(mask, idx, pt):
for j in nb.prange(pt.shape[0]):
for i in nb.prange(pt[j]):
mask[j, idx[i]] = True
I am looking for further speedup. Any ideas?
Upvotes: 1