Luke Davis
Luke Davis

Reputation: 2666

Unexpected behavior of boolean operations in NumPy ndarray inline comparisons

I find that attempting to perform multiple boolean comparisons on numpy ndarrays using &, |, ==, >=, etc. often gives unexpected results, where the pure python order of operations seems on the surface to be violated (I was wrong about this; for example, True | False==True yields True). What are the "rules" or things going on under the hood that explain these results? Here are a few examples:

  1. Comparing a boolean ndarray to the results of an elementwise comparison on a non-boolean ndarray:

    In [36]: a = np.array([1,2,3])
    In [37]: b = np.array([False, True, False])
    In [38]: b & a==2 # unexpected, with no error raised!
    Out[38]: array([False, False, False], dtype=bool)
    
    In [39]: b & (a==2) # enclosing in parentheses resolves this
    Out[39]: array([False,  True, False], dtype=bool)
    
  2. Elementwise &/| on boolean and non-boolean ndarrays:

    In [79]: b = np.array([True,False,True])
    
    In [80]: b & a # comparison is made, then array is re-cast into integers!
    Out[80]: array([1, 0, 1])
    
  3. Finding elements of array within two values:

    In [47]: a>=2 & a<=2 # have seen this in different stackexchange threads
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    
    In [48]: (a>=2) & a<=2 # similar to behavior in In[38], but instead get *True* boolean array of
    Out[48]: array([ True,  True,  True], dtype=bool)
    
    In [49]: (a>=2) & (a<=2) # expected results
    Out[49]: array([False,  True, False], dtype=bool)
    
  4. Logical &/| yielding results not in or [0,1] (which would be expected if a boolean result was coerced back into int).

    In [90]: a & 2
    Out[90]: array([0, 2, 2])
    

I welcome additional examples of this behavior.

Upvotes: 0

Views: 197

Answers (2)

hpaulj
hpaulj

Reputation: 231425

a>=2 & a<=2 is evaluated as a>=(2 & a)<=2

The () part evaluates to array([0, 0, 2], dtype=int32)

a>=(2 & a) is a boolean array. But it is part of a Python a<x<b expression, which internally uses short circuiting. That is, it evaluates a<x and depending its value might actually skip the <b part. Something like True if a<x else x<b.

The familiar ValueError ambiguous arises when a boolean array is used in a scalar Python boolean context.

Upvotes: 1

Tadhg McDonald-Jensen
Tadhg McDonald-Jensen

Reputation: 21453

I think you are confused about the precedence of the & | binary operators vs the comparison operators:

>>> import dis
>>> dis.dis("b & a==2")
  1           0 LOAD_NAME                0 (b)
              2 LOAD_NAME                1 (a)
              4 BINARY_AND
              6 LOAD_CONST               0 (2)
              8 COMPARE_OP               2 (==)
             10 RETURN_VALUE

You can see here that BINARY_AND is done first (between b and a) then the result is compared against 2 which, since it is a boolean array, is all False

The reason & and | have lower precedence is because they are not intended as logical operators, it represents the binary (math?) operation which numpy happens to use for logic, for example with ints I'd definitely expect the & to happen first:

if 13 & 7 == 5:

It is unfortunate that numpy cannot override the behaviour of the logical and and or operators since their precedence makes sense as logical operators but unfortunately they cannot be overridden so we just have to live will adding lots of brackets when doing boolean arrays.

Note that there was a proposal to allow and or to be overloaded but was not passed since basically it would only be a small convinience for numpy while making all other strict boolean operations slower.

Upvotes: 2

Related Questions