Reputation: 544
I have a function that, given a numpy array of xy coordinates, it filters those which lies within a box of side L
import numpy as np
from numba import njit
np.random.seed(65238758)
L = 10
N = 1000
xy = np.random.uniform(0, 50, (N, 2))
box = np.array([
[0,0], # lower-left
[L,L] # upper-right
])
def sinjit(xy, box):
mask = np.all(np.logical_and(xy >= box[0], xy <= box[1]), axis=1)
return xy[mask]
If I run this function it returns the correct result:
sinjit(xy, box)
Output: array([[5.53200522, 7.86890708],
[4.60188554, 9.15249881],
[9.072563 , 5.6874726 ],
[4.48976127, 8.73258166],
...
[6.29683131, 5.34225758],
[2.68057087, 5.09835442],
[5.98608603, 4.87845464],
[2.42049857, 6.34739079],
[4.28586677, 5.79125413]])
But, as I want to speed this task in a loop by using numba, there exists a compatibility problem with "axis" argument in np.all function (it is not implemented in nopython mode). So, my question is, is it possible to avoid such argument in any way? any workaround?
Upvotes: 11
Views: 4044
Reputation: 922
I encountered the same issue and decided to do some experimenting. I discovered that, if the axis axis
can be an integer literal (i.e. it is known ahead of time and doesn't need to be retrieved from a variable), there is a Numba-compatible alternative. That being said, I also found these solutions to be slower with JIT compilation in my test function, so be sure to benchmark your function to make sure there is an actual net improvement if you want to use this.
As others have pointed out, Numba doesn't support the axis
argument of several NumPy functions including np.all
. The first potential solution I thought of is np.amin
(aka np.ndarray.min
): np.all(a, axis=axis)
is identical to np.amin(a, axis=axis)
for boolean arrays, and identical to np.amin(a, axis=axis).astype('bool')
for numerical arrays. Unfortunately, np.amin
is also in the list of functions for which the axis
argument is not supported. However, np.argmin
does support the axis
argument, and so does np.take_along_axis
.
Therefore, np.all(a, axis=axis)
can be replaced with
For numeric arrays:
np.take_along_axis(a, np.expand_dims(np.argmin(a, axis=axis), axis), axis)[
(:, ){axis}
0].astype('bool')
For boolean arrays:
np.take_along_axis(a, np.expand_dims(np.argmin(a.astype('int64'), axis=axis), axis), axis)[
(:, ){axis}
0]
.astype('int64')
is needed, I don't know, so I filed a bug report.The separated parts, (:, ){axis}
, should be replaced with axis
repetitions of :,
so that the correct axis is eliminated. For example, if a
is a boolean array and axis
is 2
, you would use
np.take_along_axis(a, np.expand_dims(np.argmin(a.astype('int64'), axis=2), 2), 2)[:, :, 0]
.
All I can say about this is, if you really need a numpy.all
alternative within a function that overall would highly benefit from JIT compilation, this solution is suitable. If you're really just looking to speed up all
by itself, you won't have much luck.
test.py
import numba
import numpy as np
# @numba.njit # raises a TypingError
def using_all():
n = np.arange(10000).reshape((-1, 5)) # numeric array
b = n < 4888 # boolean array
return (np.all(n, axis=1),
np.all(b, axis=1))
# @numba.njit # raises a TypingError
def using_amin():
n = np.arange(10000).reshape((-1, 5)) # numeric array
b = n < 4888 # boolean array
return (np.amin(n, axis=1).astype('bool'),
np.amin(b, axis=1))
@numba.njit # doesn't raise a TypingError
def using_take_along_axis():
n = np.arange(10000).reshape((-1, 5)) # numeric array
b = n < 4888 # boolean array
return (np.take_along_axis(n, np.expand_dims(np.argmin(n, axis=1), 1), 1)[:, 0].astype('bool'),
np.take_along_axis(b, np.expand_dims(np.argmin(b.astype('int64'), axis=1), 1), 1)[:, 0])
if __name__ == '__main__':
a = using_all()
m = using_amin()
assert np.all(a[0] == m[0])
assert np.all(a[1] == m[1])
t = using_take_along_axis()
assert np.all(a[0] == t[0])
assert np.all(a[1] == t[1])
PS C:\> python -m timeit -n 10000 -s 'from test import using_all; using_all()' 'using_all()'
10000 loops, best of 5: 32.9 usec per loop
PS C:\> python -m timeit -n 10000 -s 'from test import using_amin; using_amin()' 'using_amin()'
10000 loops, best of 5: 43.5 usec per loop
PS C:\> python -m timeit -n 10000 -s 'from test import using_take_along_axis; using_take_along_axis()' 'using_take_along_axis()'
10000 loops, best of 5: 55.4 usec per loop
Upvotes: 1
Reputation: 4319
I really, really, really wish numba supported optional keyword arguments. Until it does, I'm pretty much ignoring it. However, some hacks are possible here.
You'll need to take extra care for anything that has other than 2 dimensions or has lengths that might be zero.
import numpy as np
from numba import njit
@njit(cache=True)
def np_all_axis0(x):
"""Numba compatible version of np.all(x, axis=0)."""
out = np.ones(x.shape[1], dtype=np.bool8)
for i in range(x.shape[0]):
out = np.logical_and(out, x[i, :])
return out
@njit(cache=True)
def np_all_axis1(x):
"""Numba compatible version of np.all(x, axis=1)."""
out = np.ones(x.shape[0], dtype=np.bool8)
for i in range(x.shape[1]):
out = np.logical_and(out, x[:, i])
return out
@njit(cache=True)
def np_any_axis0(x):
"""Numba compatible version of np.any(x, axis=0)."""
out = np.zeros(x.shape[1], dtype=np.bool8)
for i in range(x.shape[0]):
out = np.logical_or(out, x[i, :])
return out
@njit(cache=True)
def np_any_axis1(x):
"""Numba compatible version of np.any(x, axis=1)."""
out = np.zeros(x.shape[0], dtype=np.bool8)
for i in range(x.shape[1]):
out = np.logical_or(out, x[:, i])
return out
if __name__ == '__main__':
x = np.array([[1, 1, 0, 0], [1, 0, 1, 0]], dtype=bool)
np.testing.assert_array_equal(np.all(x, axis=0), np_all_axis0(x))
np.testing.assert_array_equal(np.all(x, axis=1), np_all_axis1(x))
np.testing.assert_array_equal(np.any(x, axis=0), np_any_axis0(x))
np.testing.assert_array_equal(np.any(x, axis=1), np_any_axis1(x))
I'm not sure how performant this will be, but if you really need to call the function in a higher level jit'ed function, then this will let you do it.
Upvotes: 11
Reputation: 34
No optional flag for numpy.all() is supported by numba: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
If you insist to use numba, the only way is to code in one another way.
Upvotes: 0