Reputation: 191
EDIT: Fixed values in tables.
Let's say I have a pandas dataframe df:
>>>df
a b c
0 0.016367 0.289944 -0.891527
1 1.130206 0.899758 -0.276587
2 1.390528 -1.472802 0.128979
3 0.023598 -0.931329 0.158143
4 1.401183 -0.162357 -0.959156
5 -0.127765 1.142039 -0.734434
So now I try to do some Boolean indexing:
>>>df[df > 0.5]
a b c
0 NaN NaN Nan
1 1.130206 0.899758 NaN
2 1.390528 NaN NaN
3 NaN NaN NaN
4 1.401183 NaN NaN
5 NaN 1.142039 NaN
>>>df[df < 0]
a b c
0 NaN NaN -0.891527
1 NaN NaN -0.276587
2 NaN -1.472802 NaN
3 NaN -0.931329 NaN
4 NaN -0.162357 -0.959156
5 -0.127765 NaN -0.734434
So now I try to do the logical OR of thos to conditions as the indexing condition:
>>>df[df > 0.5 or df < 0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Ben\Anaconda\lib\site-packages\pandas\core\generic.py", line 692, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I've researched this a bit, its a basic feature that the developers of numpy decided certain conditions may be ambiguous depending in the any or all case. What I don't get is why checking if the value is > 0.5 is valid and checking if its < 0 is valid, but checking if its > 0.5 or < 0 is INVALID. I've also tried mixing up the boolean syntax, but this error is in escable. Can some one explain why doing the OR creates an ambiguous case?
Upvotes: 1
Views: 2390
Reputation: 251428
It is not possible for custom types to override the behavior of and
and or
in Python. That is, it is not possible for Numpy to say that it wants [0, 1, 1] and [1, 1, 0]
to be [0, 1, 0]
. This is because of how the and
operation short-circuits (see the documentation); in essence, the short-circuiting behavior of and
and or
means that these operations must work as two separate truth values on the two arguments; they cannot combine their two operands in some way that makes use of data in both operands at once (for instance, to compare the elements componentwise, as would be natural for Numpy).
The solution is to use the bitwise operators &
and |
. However, you do have to be careful with this, since the precedence is not what you might expect.
Upvotes: 3
Reputation: 64328
Since the logical operators are not overridable in python, numpy and pandas override the bitwise operators.
This means you need to use the bitwise-or operator:
df[(df > 0.5) | (df < 0)]
Upvotes: 1
Reputation: 394159
You need to use the bitwise or and put the conditions in parentheses:
df[(df > 0.5) | (df < 0)]
The reason is because it is ambiguous to compare arrays when maybe some of the values in the array satisfy the condition, that is why it becomes ambiguous.
If you called the attribute any
then it would evaluate to True.
The parentheses is required due to operator precedence.
Example:
In [23]:
df = pd.DataFrame(randn(5,5))
df
Out[23]:
0 1 2 3 4
0 0.320165 0.123677 -0.202609 1.225668 0.327576
1 -0.620356 0.126270 1.191855 0.903879 0.214802
2 -0.974635 1.712151 1.178358 0.224962 -0.921045
3 -1.337430 -1.225469 1.150564 -1.618739 -1.297221
4 -0.093164 -0.928846 1.035407 1.766096 1.456888
In [24]:
df[(df > 0.5) | (df < 0)]
Out[24]:
0 1 2 3 4
0 NaN NaN -0.202609 1.225668 NaN
1 -0.620356 NaN 1.191855 0.903879 NaN
2 -0.974635 1.712151 1.178358 NaN -0.921045
3 -1.337430 -1.225469 1.150564 -1.618739 -1.297221
4 -0.093164 -0.928846 1.035407 1.766096 1.456888
Upvotes: 0