Reputation: 447
In a pandas DataFrame, I have a series of boolean values. In order to filter to rows where the boolean is True, I can use: df[df.column_x]
I thought in order to filter to only rows where the column is False, I could use: df[~df.column_x]
. I feel like I have done this before, and have seen it as the accepted answer.
However, this fails because ~df.column_x
converts the values to integers. See below.
import pandas as pd . # version 0.24.2
a = pd.Series(['a', 'a', 'a', 'a', 'b', 'a', 'b', 'b', 'b', 'b'])
b = pd.Series([True, True, True, True, True, False, False, False, False, False], dtype=bool)
c = pd.DataFrame(data=[a, b]).T
c.columns = ['Classification', 'Boolean']```
print(~c.Boolean)
0 -2
1 -2
2 -2
3 -2
4 -2
5 -1
6 -1
7 -1
8 -1
9 -1
Name: Boolean, dtype: object
print(~b)
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
9 True
dtype: bool
Basically, I can use c[~b]
, but not c[~c.Boolean]
Am I just dreaming that this use to work?
Upvotes: 19
Views: 1599
Reputation: 323226
Ah , since you created the c
by using DataFrame
constructor , then T
,
1st let us look at what we have before T
:
pd.DataFrame([a, b])
Out[610]:
0 1 2 3 4 5 6 7 8 9
0 a a a a b a b b b b
1 True True True True True False False False False False
So pandas
will make each columns only have one dtype
, if not it will convert to object
.
After T
what data type we have for each columns
The dtypes
in your c
:
c.dtypes
Out[608]:
Classification object
Boolean object
Boolean
columns
became object
type , that is why you get unexpected output for ~c.Boolean
How to fix it ? ---concat
c=pd.concat([a,b],1)
c.columns = ['Classification', 'Boolean']
~c.Boolean
Out[616]:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
9 True
Name: Boolean, dtype: bool
Upvotes: 16