jolvi
jolvi

Reputation: 4651

numpy.logical_and versus multiplication

Given three numpy arrays a, b, and c (EDIT: of the same shape/size), it seems that for non-complex numbers

a * b * c != 0  # test element-wise whether all are non-zero

gives the same result as:

np.logical_and(a, np.logical_and(b, c))

Is there a hidden pitfall in the first version? Is there even a simpler way to test this?

Upvotes: 4

Views: 9081

Answers (4)

jolvi
jolvi

Reputation: 4651

Giving a short summary: a * b * c != 0 can lead to an overflow or underflow. The alternative ~((a == 0) | (b == 0) | (c == 0)) seems to execute faster than any other implementation of the test.

Upvotes: 0

Prune
Prune

Reputation: 77837

They are the same. 0 and None are False; any other value is True. However, the logical test is faster. If you have a lot of arrays in the list, consider using Python's all and any methods.


For example:

for value in [True, False, 0, 1, None, 7, 'a', [False, False, False]]:
    if value:
        print value, True
    else:
        print value, False

Output:

True True
False False
0 False
1 True
None False
7 True
a True
[False, False, False] True

Upvotes: -1

Curt F.
Curt F.

Reputation: 4824

Some observations:

import numpy as np
import timeit
​
a = np.random.randint(0, 5, 100000)
b = np.random.randint(0, 5, 100000)
c = np.random.randint(0, 5, 100000)
​
method_one = np.logical_and(np.logical_and(a, b), c)
%timeit np.logical_and(np.logical_and(a, b), c)
​
method_two = a*b*c != 0
%timeit a*b*c != 0
​
method_three = np.logical_and(np.logical_and(a.astype('bool'), b.astype('bool')), c.astype('bool'))
%timeit np.logical_and(np.logical_and(a.astype('bool'), b.astype('bool')), c.astype('bool'))
​
method_four = a.astype('bool') * b.astype('bool') * c.astype('bool')  != 0
%timeit a.astype('bool') * b.astype('bool') * c.astype('bool')  != 0


# verify all methods give equivalent results
all([
    np.all(method_one == method_two), 
    np.all(method_one == method_three),
    np.all(method_one == method_four)
    ]
   )

1000 loops, best of 3: 713 µs per loop
1000 loops, best of 3: 341 µs per loop
1000 loops, best of 3: 252 µs per loop
1000 loops, best of 3: 388 µs per loop

True

Some interpretations:

  1. The speed of the a*b*c != 0 method will depend on the dtype of the vectors, since multiplication is done first. So if you've got floats or bigints or some other larger dtype, this step will take longer than for vectors of the same length that are boolean or small integers. Coercing to a bool dtype speeds up this method. If the vectors have different dtypes, things will be even slower. Multiplying an integer array by a float array requires converting integers to floats, then coercing to boolean. Not optimal.

  2. For reasons I don't understand, Prune's answer's statement that However, the logical test is faster seems to be correct only when the input vectors are already boolean. Perhaps the way in which coercion to boolean happens in the straight-up logical_and() method is slower than using .asdtype('bool').

  3. The fastest way to go seems to be (1) coerce inputs to boolean ahead of time and then (2) use np.logical_and().

Upvotes: 4

Divakar
Divakar

Reputation: 221554

Given b and c holding real numbers, np.logical_and(b, c) would esentially involve under-the-hood conversion to boolean numbers.

Can we do the conversion upfront? If so, would that help?

Now, the stated operation of checking if ALL corresponding elements are non-zeros would be equivalent to checking if the boolean-not of ANY of the corresponding elements are zeros, i.e.

~((a == 0) + (b==0) + (c==0)

OR

~((a == 0) | (b==0) | (c==0))

Also, this would involve upfront conversion to boolean after comparison with zero, so that might help with performance. Here's the runtime numbers involved -

Case #1:

In [10]: # Setup inputs
    ...: M, N = 100, 100
    ...: a = np.random.randint(0,5,(M,N))
    ...: b = np.random.randint(0,5,(M,N))
    ...: c = np.random.randint(0,5,(M,N))
    ...: 

In [11]: %timeit np.logical_and(a, np.logical_and(b, c))
    ...: %timeit a * b * c != 0
    ...: %timeit ~((a == 0) + (b==0) + (c==0))
    ...: %timeit ~((a == 0) | (b==0) | (c==0))
    ...: 
10000 loops, best of 3: 96.6 µs per loop
10000 loops, best of 3: 78.2 µs per loop
10000 loops, best of 3: 51.6 µs per loop
10000 loops, best of 3: 51.5 µs per loop

Case #2:

In [12]: # Setup inputs
    ...: M, N = 1000, 1000
    ...: a = np.random.randint(0,5,(M,N))
    ...: b = np.random.randint(0,5,(M,N))
    ...: c = np.random.randint(0,5,(M,N))
    ...: 

In [13]: %timeit np.logical_and(a, np.logical_and(b, c))
    ...: %timeit a * b * c != 0
    ...: %timeit ~((a == 0) + (b==0) + (c==0))
    ...: %timeit ~((a == 0) | (b==0) | (c==0))
    ...: 
100 loops, best of 3: 11.4 ms per loop
10 loops, best of 3: 24.1 ms per loop
100 loops, best of 3: 9.29 ms per loop
100 loops, best of 3: 9.2 ms per loop

Case #3:

In [14]: # Setup inputs
    ...: M, N = 5000, 5000
    ...: a = np.random.randint(0,5,(M,N))
    ...: b = np.random.randint(0,5,(M,N))
    ...: c = np.random.randint(0,5,(M,N))
    ...: 

In [15]: %timeit np.logical_and(a, np.logical_and(b, c))
    ...: %timeit a * b * c != 0
    ...: %timeit ~((a == 0) + (b==0) + (c==0))
    ...: %timeit ~((a == 0) | (b==0) | (c==0))
    ...: 
1 loops, best of 3: 294 ms per loop
1 loops, best of 3: 694 ms per loop
1 loops, best of 3: 268 ms per loop
1 loops, best of 3: 268 ms per loop

Seems like there is a good percentage of benefit with the comparison to zero approach!

Upvotes: 4

Related Questions