Reputation: 13079
Suppose that you have two NumPy arrays, a
and b
, and you want to test whether any value of a
is greater than the corresponding value of b
.
Now you could calculate a boolean array and call its any
method:
(a > b).any()
This will do all the looping internally, which is good, but it suffers from the need to perform the comparison on all the pairs even if, say, the very first result evaluates as True
.
Alternatively, you could do an explicit loop over scalar comparisons. An example implementation in the case where a
and b
are the same shape (so broadcasting is not required) might look like:
any(ai > bi for ai, bi in zip(a.flatten(), b.flatten()))
This will benefit from the ability to stop processing after the first True
result is encountered, but with all the costs associated with an explicit loop in Python (albeit inside a comprehension).
Is there any way, either in NumPy itself or in an external library, that you could pass in a description of the operation that you wish to perform, rather than the result of that operation, and then have it perform the operation internally (in optimised low-level code) inside an "any" loop that can be broken out from?
One could imagine hypothetically some kind of interface like:
from array_operations import GreaterThan, Any
expression1 = GreaterThan('x', 'y')
expression2 = Any(expression1)
print(expression2.evaluate(x=a, y=b))
If such a thing exists, clearly it could have other uses beyond efficient evaluation of all
and any
, in terms of being able to create functions dynamically.
Is there anything like this?
Upvotes: 5
Views: 125
Reputation: 249502
One way to solve this is with delayed/deferred/lazy evaluation. The C++ community uses something called "expression templates" to achieve this; you can find an accessible overview here: http://courses.csail.mit.edu/18.337/2015/projects/TylerOlsen/18337_tjolsen_ExpressionTemplates.pdf
In Python the easiest way to do this is using Numba. You basically just write the function you need in Python using for
loops, then you decorate it with @numba.njit
and it's done. Like this:
@numba.njit
def any_greater(a, b):
for ai, bi in zip(a.flatten(), b.flatten()):
if ai > bi:
return True
return False
There is/was a NumPy enhancement proposal that could help your use case, but I don't think it has been implemented: https://docs.scipy.org/doc/numpy-1.13.0/neps/deferred-ufunc-evaluation.html
Upvotes: 4