Reputation: 47
I want to use boolean indexing on numpy arrays and pandas series to select all rows with a value of <x in a certain column but also rows with nan values shall be included. I tried like in this small example but I get the error "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
import numpy as np
a = [0.5, 0, 1, 17, np.nan]
b = np.array(a)
c = b[b < 3 or b == np.nan]
Upvotes: 2
Views: 304
Reputation: 2431
You need to use brackets, replace or
with the |
operator and use np.isnan
instead of ==
.
c = b[(b < 3) | np.isnan(b)]
The RuntimeWarning
is perhaps because this tries to compare NaN
values to numeric. You can try to ignore it for that session:
with np.errstate(invalid='ignore'):
c = b[(b < 3) | np.isnan(b)]
Upvotes: 1
Reputation: 1260
One of these should do the trick:
c = b[np.logical_or(b < 3, b == np.nan)]
# or
c = b[np.where((b < 3) | (b == np.nan))]
# or
c = b[(b < 3) | (b == np.nan)]
I don't know if there are any benefits to one over the other, but if I had to guess I'd say the np.where
one might be slightly faster? Just a guess.
Upvotes: 0