Reputation: 1135

How to speed up the performance of array masking from the results of numpy.searchsorted in python?

I want to generate a mask from the results of numpy.searchsorted():

import numpy as np

# generate test examples
x = np.random.rand(1000000)
y = np.random.rand(200)

# sort x
idx = np.argsort(x)
sorted_x = np.take_along_axis(x, idx, axis=-1)

# searchsort y in x
pt = np.searchsorted(sorted_x, y)

pt is an array. Then I want to create a boolean mask of size (200, 1000000) with True values when its indices are idx[0:pt[i]], and I come up with a for-loop like this:

mask = np.zeros((200, 1000000), dtype='bool')
for i in range(200):
     mask[i, idx[0:pt[i]]] = True

Anyone has an idea to speed up the for-loop?

Upvotes: 7

Answers (4)

Divakar

Reputation: 221574

Approach #1

Going by the new-found information picked up off OP's comments that states only y is changing in real-time, we can pre-process lots of stuffs around x and hence do much better. We will create a hashing array that will store stepped masks. For the part that involves y, we will simply index into the hashing array with the indices obtained off searchsorted which will approximate the final mask array. A final step of assigning the remaining bools could be offloaded to numba given its ragged nature. This should also be beneficial if we decide to scale up the lengths of y.

Let's look at the implementation.

Pre-processing with x :

sidx = x.argsort()
ssidx = x.argsort().argsort()

# Choose a scale factor. 
# 1. A small one would store more mapping info, hence faster but occupy more mem
# 2. A big one would store less mapping info, hence slower, but memory efficient.
scale_factor = 100
mapar = np.arange(0,len(x),scale_factor)[:,None] > ssidx

Remaining steps with y :

import numba as nb

@nb.njit(parallel=True,fastmath=True)
def array_masking3(out, starts, idx, sidx):
    N = len(out)
    for i in nb.prange(N):
        for j in nb.prange(starts[i], idx[i]):
            out[i,sidx[j]] = True
    return out

idx = np.searchsorted(x,y,sorter=sidx)
s0 = idx//scale_factor
starts = s0*scale_factor
out = mapar[s0]
out = array_masking3(out, starts, idx, sidx)

Benchmarking

In [2]: x = np.random.rand(1000000)
   ...: y = np.random.rand(200)

In [3]: ## Pre-processing step with "x"
   ...: sidx = x.argsort()
   ...: ssidx = x.argsort().argsort()
   ...: scale_factor = 100
   ...: mapar = np.arange(0,len(x),scale_factor)[:,None] > ssidx

In [4]: %%timeit
   ...: idx = np.searchsorted(x,y,sorter=sidx)
   ...: s0 = idx//scale_factor
   ...: starts = s0*scale_factor
   ...: out = mapar[s0]
   ...: out = array_masking3(out, starts, idx, sidx)
41 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

 # A 1/10th smaller hashing array has similar timings
In [7]: scale_factor = 1000
   ...: mapar = np.arange(0,len(x),scale_factor)[:,None] > ssidx

In [8]: %%timeit
   ...: idx = np.searchsorted(x,y,sorter=sidx)
   ...: s0 = idx//scale_factor
   ...: starts = s0*scale_factor
   ...: out = mapar[s0]
   ...: out = array_masking3(out, starts, idx, sidx)
40.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# @silgon's soln    
In [5]: %timeit x[np.newaxis,:] < y[:,np.newaxis]
138 ms ± 896 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Approach #2

This borrowed a good part from OP's solution.

import numba as nb

@nb.njit(parallel=True)
def array_masking2(mask1D, mask_out, idx, pt):
    n = len(idx)
    for j in nb.prange(len(pt)):
        if mask1D[j]:
            for i in nb.prange(pt[j],n):
                mask_out[j, idx[i]] = False
        else:
            for i in nb.prange(pt[j]):
                mask_out[j, idx[i]] = True
    return mask_out

def app2(idx, pt):
    m,n = len(pt), len(idx)      
    mask1 = pt>len(x)//2
    mask2 = np.broadcast_to(mask1[:,None], (m,n)).copy()
    return array_masking2(mask1, mask2, idx, pt)

So, the idea is once, we have larger than half of indices to be set True, we switch over to set False instead after pre-assigning those rows as all True. This results in lesser memory accesses and hence some noticeable performance boost.

Benchmarking

OP's solution :

@nb.njit(parallel=True,fastmath=True)
def array_masking(mask, idx, pt):
    for j in nb.prange(pt.shape[0]):
        for i in nb.prange(pt[j]):
            mask[j, idx[i]] = True
    return mask

def app1(idx, pt):
    m,n = len(pt), len(idx)      
    mask = np.zeros((m, n), dtype='bool')
    return array_masking(mask, idx, pt)

Timings -

In [5]: np.random.seed(0)
   ...: x = np.random.rand(1000000)
   ...: y = np.random.rand(200)

In [6]: %timeit app1(idx, pt)
264 ms ± 8.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit app2(idx, pt)
165 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 3

silgon

Reputation: 7211

This is an alternate answer but not sure that it's exactly what you need.

x = np.random.rand(1000000)
y = np.random.rand(200)
mask = x[np.newaxis,:] < y[:,np.newaxis]

Note: I mentioned that maybe it's not what you need because you specify the need of numpy.searchsorted() and here I don't use it, however I get the same result. It might also be useful to somebody else in some future if it doesn't completely fits your needs ;).

Timings (@DanielF edit)

Setup:

import numpy as np

# generate test examples
x = np.random.rand(1000000)
y = np.random.rand(200)

# sort x
idx = np.argsort(x)
sorted_x = np.take_along_axis(x, idx, axis=-1)

Running:

%%timeit   #  silgon
mask = x[np.newaxis,:] < y[:,np.newaxis]

166 ms ± 3.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%%timeit    # Divakar
pt = np.searchsorted(sorted_x, y)
mask = app2(idx, pt)

316 ms ± 29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


%%timeit   #  f.c.
pt = np.searchsorted(sorted_x, y)
mask = np.zeros((200, 1000000), dtype='bool')
for i in range(200):
     mask[i, idx[0:pt[i]]] = True
     
466 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 2

mathfux

Reputation: 5949

For a better clarity, you're looking for a fast way to determine mask of indices of items of x that are smaller than y[i]. For example, if indices that sorts items of x are:

np.argsort(x) = [5, 0, 2, 10, 7, 8, 9, 11, 1, 6, 3, 4]

and you know that 8 items are smaller than y[i], you'll need to choose first 8 items from inverse order of that list then:

arg_inv = [1, 8, 2, 10, 11, 0, 9, 4, 5, 6, 3, 7]

The easiest approach of this problem is advanced indexing:

length_x, length_y = len(x), len(y)
idx = np.argsort(x)
arg_inv = np.argsort(idx)
pt = np.searchsorted(x, y, sorter=idx)
mask = np.zeros((length_y, length_x), dtype='bool')
row, col = np.divmod(np.arange(length_x * length_y), length_x)
mask[row, col] = arg_inv[col] < pt[row]
return mask

I also add an example of small samples:

x = [0.809 0.958 0.881 0.146 0.882 0.421 0.604]
y = [0.119 0.981 0.775 0.254]
np.sort(x) = [0.146 0.421 0.604 0.809 0.881 0.882 0.958]
np.argsort(x) = [3 5 6 0 2 4 1]
arg_inv = [3 6 4 0 5 1 2]
pt = [0 7 3 1]
Process of advanced indexing:

    row  col  arg_inv[col]  pt[row]  arg_inv[col] < pt[row]
0     0    0             3        0                       0
1     0    1             6        0                       0
2     0    2             4        0                       0
3     0    3             0        0                       0
4     0    4             5        0                       0
5     0    5             1        0                       0
6     0    6             2        0                       0
7     1    0             3        7                       1
8     1    1             6        7                       1
9     1    2             4        7                       1
10    1    3             0        7                       1
11    1    4             5        7                       1
12    1    5             1        7                       1
13    1    6             2        7                       1
14    2    0             3        3                       0
15    2    1             6        3                       0
16    2    2             4        3                       0
17    2    3             0        3                       1
18    2    4             5        3                       0
19    2    5             1        3                       1
20    2    6             2        3                       1
21    3    0             3        1                       0
22    3    1             6        1                       0
23    3    2             4        1                       0
24    3    3             0        1                       1
25    3    4             5        1                       0
26    3    5             1        1                       0
27    3    6             2        1                       0

Upvotes: 1

f. c.