peter.petrov
peter.petrov

Reputation: 39457

NumPy - selection from 2D array based on a Boolean condition

I came across this section this in a book.

Boolean-Arrays-as-Masks

Here x is a pure NumPy array.

image001

I think this behavior of x [x<5] from NumPy is incompatible with Pandas, right?

I didn't know that but... it seems in NumPy we get a one-dimensional array back from x [x<5] (even though x is 2D).

I have some prior experience with Pandas, so I would have expected to get a 2D array (3x4) back filled with NaN where the condition is not satisfied, and filled with the actual values, where the condition is satisfied.

Why does it behave this way (i.e. returns 1D array)? That doesn't seem very useful, does it?

Is there another NumPy function which works in a way similar to Pandas, i.e. does not change the shape of the original array?

Upvotes: 1

Views: 2316

Answers (1)

Andr&#233;
Andr&#233;

Reputation: 1068

The expected behavior you describe can be replicated using np.where:

import numpy as np

x = 10.*np.cos( np.arange(0, 8, 0.5) ).reshape([4,4]) # Just some 4x4 test data

print(x)
#[[10.          8.77582562  5.40302306  0.70737202]
# [-4.16146837 -8.01143616 -9.89992497 -9.36456687]
# [-6.53643621 -2.10795799  2.83662185  7.08669774]
# [ 9.60170287  9.76587626  7.53902254  3.46635318]]

print(x > 5)
#[[ True  True  True False]
# [False False False False]
# [False False False  True]
# [ True  True  True False]]

print(np.where(x > 5, x, np.nan))
#[[10.          8.77582562  5.40302306         nan]
# [        nan         nan         nan         nan]
# [        nan         nan         nan  7.08669774]
# [ 9.60170287  9.76587626  7.53902254         nan]]

# In contrast to:
print(x[x>5])
# [10.          8.77582562  5.40302306  7.08669774  9.60170287  9.76587626  7.53902254]

As you can see, this substitutes all occurences that do not satisfy the condition with np.nan. Note that this function is not limited to np.nan, but it is also common to use a "fallback-value" of 0. In addition, this method is useful to switch between entries of two matrices based on a condition, since none of the functions parameters need to be scalar.

Upvotes: 2

Related Questions