alani
alani

Reputation: 13079

numpy: efficient way to do "any" or "all" on the result of an operation

Suppose that you have two NumPy arrays, a and b, and you want to test whether any value of a is greater than the corresponding value of b.

Now you could calculate a boolean array and call its any method:

(a > b).any()

This will do all the looping internally, which is good, but it suffers from the need to perform the comparison on all the pairs even if, say, the very first result evaluates as True.

Alternatively, you could do an explicit loop over scalar comparisons. An example implementation in the case where a and b are the same shape (so broadcasting is not required) might look like:

any(ai > bi for ai, bi in zip(a.flatten(), b.flatten()))

This will benefit from the ability to stop processing after the first True result is encountered, but with all the costs associated with an explicit loop in Python (albeit inside a comprehension).

Is there any way, either in NumPy itself or in an external library, that you could pass in a description of the operation that you wish to perform, rather than the result of that operation, and then have it perform the operation internally (in optimised low-level code) inside an "any" loop that can be broken out from?

One could imagine hypothetically some kind of interface like:

from array_operations import GreaterThan, Any

expression1 = GreaterThan('x', 'y')
expression2 = Any(expression1)

print(expression2.evaluate(x=a, y=b))

If such a thing exists, clearly it could have other uses beyond efficient evaluation of all and any, in terms of being able to create functions dynamically.

Is there anything like this?

Upvotes: 5

Views: 125

Answers (1)

John Zwinck
John Zwinck

Reputation: 249502

One way to solve this is with delayed/deferred/lazy evaluation. The C++ community uses something called "expression templates" to achieve this; you can find an accessible overview here: http://courses.csail.mit.edu/18.337/2015/projects/TylerOlsen/18337_tjolsen_ExpressionTemplates.pdf

In Python the easiest way to do this is using Numba. You basically just write the function you need in Python using for loops, then you decorate it with @numba.njit and it's done. Like this:

@numba.njit
def any_greater(a, b):
    for ai, bi in zip(a.flatten(), b.flatten()): 
        if ai > bi: 
            return True 
    return False 

There is/was a NumPy enhancement proposal that could help your use case, but I don't think it has been implemented: https://docs.scipy.org/doc/numpy-1.13.0/neps/deferred-ufunc-evaluation.html

Upvotes: 4

Related Questions