PDistl
PDistl

Reputation: 47

Boolean indexing with two conditions to include NaN

I want to use boolean indexing on numpy arrays and pandas series to select all rows with a value of <x in a certain column but also rows with nan values shall be included. I tried like in this small example but I get the error "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"

import numpy as np

a = [0.5, 0, 1, 17, np.nan]
b = np.array(a)
c = b[b < 3 or b == np.nan]

Upvotes: 2

Views: 304

Answers (2)

nocibambi
nocibambi

Reputation: 2431

You need to use brackets, replace or with the | operator and use np.isnan instead of ==.

c = b[(b < 3) | np.isnan(b)]

The RuntimeWarning is perhaps because this tries to compare NaN values to numeric. You can try to ignore it for that session:

with np.errstate(invalid='ignore'):
    c = b[(b < 3) | np.isnan(b)]

Upvotes: 1

mattlangford
mattlangford

Reputation: 1260

One of these should do the trick:

c = b[np.logical_or(b < 3, b == np.nan)]
# or
c = b[np.where((b < 3) | (b == np.nan))]
# or
c = b[(b < 3) | (b == np.nan)]

I don't know if there are any benefits to one over the other, but if I had to guess I'd say the np.where one might be slightly faster? Just a guess.

Upvotes: 0

Related Questions