Reputation: 39457
I came across this section this in a book.
Here x
is a pure NumPy array.
I think this behavior of x [x<5]
from NumPy is incompatible with Pandas, right?
I didn't know that but... it seems in NumPy we get a one-dimensional array back from x [x<5]
(even though x
is 2D).
I have some prior experience with Pandas, so I would have expected to get a 2D array (3x4) back filled with NaN
where the condition is not satisfied, and filled with the actual values, where the condition is satisfied.
Why does it behave this way (i.e. returns 1D array)? That doesn't seem very useful, does it?
Is there another NumPy function which works in a way similar to Pandas, i.e. does not change the shape of the original array?
Upvotes: 1
Views: 2316
Reputation: 1068
The expected behavior you describe can be replicated using np.where
:
import numpy as np
x = 10.*np.cos( np.arange(0, 8, 0.5) ).reshape([4,4]) # Just some 4x4 test data
print(x)
#[[10. 8.77582562 5.40302306 0.70737202]
# [-4.16146837 -8.01143616 -9.89992497 -9.36456687]
# [-6.53643621 -2.10795799 2.83662185 7.08669774]
# [ 9.60170287 9.76587626 7.53902254 3.46635318]]
print(x > 5)
#[[ True True True False]
# [False False False False]
# [False False False True]
# [ True True True False]]
print(np.where(x > 5, x, np.nan))
#[[10. 8.77582562 5.40302306 nan]
# [ nan nan nan nan]
# [ nan nan nan 7.08669774]
# [ 9.60170287 9.76587626 7.53902254 nan]]
# In contrast to:
print(x[x>5])
# [10. 8.77582562 5.40302306 7.08669774 9.60170287 9.76587626 7.53902254]
As you can see, this substitutes all occurences that do not satisfy the condition with np.nan
. Note that this function is not limited to np.nan
, but it is also common to use a "fallback-value" of 0
. In addition, this method is useful to switch between entries of two matrices based on a condition, since none of the functions parameters need to be scalar.
Upvotes: 2