Dani
Dani

Reputation: 544

workaround for numpy np.all axis argument; compatibility with numba

I have a function that, given a numpy array of xy coordinates, it filters those which lies within a box of side L

import numpy as np
from numba import njit

np.random.seed(65238758)

L = 10
N = 1000
xy = np.random.uniform(0, 50, (N, 2))
box = np.array([
    [0,0],  # lower-left
    [L,L]  # upper-right
]) 

def sinjit(xy, box):
    mask = np.all(np.logical_and(xy >= box[0], xy <= box[1]), axis=1)
    return xy[mask]

If I run this function it returns the correct result:

sinjit(xy, box)

Output: array([[5.53200522, 7.86890708],
       [4.60188554, 9.15249881],
       [9.072563  , 5.6874726 ],
       [4.48976127, 8.73258166],
       ...
       [6.29683131, 5.34225758],
       [2.68057087, 5.09835442],
       [5.98608603, 4.87845464],
       [2.42049857, 6.34739079],
       [4.28586677, 5.79125413]])

But, as I want to speed this task in a loop by using numba, there exists a compatibility problem with "axis" argument in np.all function (it is not implemented in nopython mode). So, my question is, is it possible to avoid such argument in any way? any workaround?

Upvotes: 11

Views: 4044

Answers (3)

QuaternionsRock
QuaternionsRock

Reputation: 922

I encountered the same issue and decided to do some experimenting. I discovered that, if the axis axis can be an integer literal (i.e. it is known ahead of time and doesn't need to be retrieved from a variable), there is a Numba-compatible alternative. That being said, I also found these solutions to be slower with JIT compilation in my test function, so be sure to benchmark your function to make sure there is an actual net improvement if you want to use this.

As others have pointed out, Numba doesn't support the axis argument of several NumPy functions including np.all. The first potential solution I thought of is np.amin (aka np.ndarray.min): np.all(a, axis=axis) is identical to np.amin(a, axis=axis) for boolean arrays, and identical to np.amin(a, axis=axis).astype('bool') for numerical arrays. Unfortunately, np.amin is also in the list of functions for which the axis argument is not supported. However, np.argmin does support the axis argument, and so does np.take_along_axis.

Therefore, np.all(a, axis=axis) can be replaced with

For numeric arrays:

np.take_along_axis(a, np.expand_dims(np.argmin(a, axis=axis), axis), axis)[ (:, ){axis} 0].astype('bool')

For boolean arrays:

np.take_along_axis(a, np.expand_dims(np.argmin(a.astype('int64'), axis=axis), axis), axis)[ (:, ){axis} 0]

The separated parts, (:, ){axis}, should be replaced with axis repetitions of :, so that the correct axis is eliminated. For example, if a is a boolean array and axis is 2, you would use

np.take_along_axis(a, np.expand_dims(np.argmin(a.astype('int64'), axis=2), 2), 2)[:, :, 0].

Benchmarks

All I can say about this is, if you really need a numpy.all alternative within a function that overall would highly benefit from JIT compilation, this solution is suitable. If you're really just looking to speed up all by itself, you won't have much luck.

test.py

import numba
import numpy as np


# @numba.njit  # raises a TypingError
def using_all():
    n = np.arange(10000).reshape((-1, 5))  # numeric array
    b = n < 4888  # boolean array
    return (np.all(n, axis=1),
            np.all(b, axis=1))


# @numba.njit  # raises a TypingError
def using_amin():
    n = np.arange(10000).reshape((-1, 5))  # numeric array
    b = n < 4888  # boolean array
    return (np.amin(n, axis=1).astype('bool'),
            np.amin(b, axis=1))


@numba.njit  # doesn't raise a TypingError
def using_take_along_axis():
    n = np.arange(10000).reshape((-1, 5))  # numeric array
    b = n < 4888  # boolean array
    return (np.take_along_axis(n, np.expand_dims(np.argmin(n, axis=1), 1), 1)[:, 0].astype('bool'),
            np.take_along_axis(b, np.expand_dims(np.argmin(b.astype('int64'), axis=1), 1), 1)[:, 0])


if __name__ == '__main__':
    a = using_all()
    m = using_amin()
    assert np.all(a[0] == m[0])
    assert np.all(a[1] == m[1])
    t = using_take_along_axis()
    assert np.all(a[0] == t[0])
    assert np.all(a[1] == t[1])
PS C:\> python -m timeit -n 10000 -s 'from test import using_all; using_all()' 'using_all()'           
10000 loops, best of 5: 32.9 usec per loop
PS C:\> python -m timeit -n 10000 -s 'from test import using_amin; using_amin()' 'using_amin()'
10000 loops, best of 5: 43.5 usec per loop
PS C:\> python -m timeit -n 10000 -s 'from test import using_take_along_axis; using_take_along_axis()' 'using_take_along_axis()'
10000 loops, best of 5: 55.4 usec per loop

Upvotes: 1

DStauffman
DStauffman

Reputation: 4319

I really, really, really wish numba supported optional keyword arguments. Until it does, I'm pretty much ignoring it. However, some hacks are possible here.

You'll need to take extra care for anything that has other than 2 dimensions or has lengths that might be zero.

import numpy as np
from numba import njit

@njit(cache=True)
def np_all_axis0(x):
    """Numba compatible version of np.all(x, axis=0)."""
    out = np.ones(x.shape[1], dtype=np.bool8)
    for i in range(x.shape[0]):
        out = np.logical_and(out, x[i, :])
    return out

@njit(cache=True)
def np_all_axis1(x):
    """Numba compatible version of np.all(x, axis=1)."""
    out = np.ones(x.shape[0], dtype=np.bool8)
    for i in range(x.shape[1]):
        out = np.logical_and(out, x[:, i])
    return out

@njit(cache=True)
def np_any_axis0(x):
    """Numba compatible version of np.any(x, axis=0)."""
    out = np.zeros(x.shape[1], dtype=np.bool8)
    for i in range(x.shape[0]):
        out = np.logical_or(out, x[i, :])
    return out

@njit(cache=True)
def np_any_axis1(x):
    """Numba compatible version of np.any(x, axis=1)."""
    out = np.zeros(x.shape[0], dtype=np.bool8)
    for i in range(x.shape[1]):
        out = np.logical_or(out, x[:, i])
    return out

if __name__ == '__main__':
    x = np.array([[1, 1, 0, 0], [1, 0, 1, 0]], dtype=bool)
    np.testing.assert_array_equal(np.all(x, axis=0), np_all_axis0(x))
    np.testing.assert_array_equal(np.all(x, axis=1), np_all_axis1(x))
    np.testing.assert_array_equal(np.any(x, axis=0), np_any_axis0(x))
    np.testing.assert_array_equal(np.any(x, axis=1), np_any_axis1(x))

I'm not sure how performant this will be, but if you really need to call the function in a higher level jit'ed function, then this will let you do it.

Upvotes: 11

Ali Khosravi
Ali Khosravi

Reputation: 34

No optional flag for numpy.all() is supported by numba: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

If you insist to use numba, the only way is to code in one another way.

Upvotes: 0

Related Questions