Reputation: 541
I have two boolean columns A and B in a pandas dataframe, each with missing data (represented by NaN). What I want is to do an AND operation on the two columns, but I want the resulting boolean column to be NaN if either of the original columns is NaN. I have the following table:
A B
0 True True
1 True False
2 False True
3 True NaN
4 NaN NaN
5 NaN False
Now when I do df.A & df.B
I want:
0 True
1 False
2 False
3 NaN
4 NaN
5 False
dtype: bool
but instead I get:
0 True
1 False
2 False
3 True
4 True
5 False
dtype: bool
This behaviour is consistent with np.bool(np.nan) & np.bool(False)
and its permutations, but what I really want is a column that tells me for certain if each row is True for both, or for certain could not be True for both. If I know it is True for both, then the result should be True, if I know that it is False for at least one then it should be False, and otherwise I need NaN to show that the datum is missing.
Is there a way to achieve this?
Upvotes: 15
Views: 10495
Reputation: 403030
This operation is directly supported by pandas provided you are using the new Nullable Boolean Type boolean
(not to be confused with the traditional numpy bool
type).
# Setup
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan],
'B':[True, False, True, np.nan, np.nan, False]})
df.dtypes
A object
B object
dtype: object
# A little shortcut to convert the data type to `boolean`
df2 = df.convert_dtypes()
df2.dtypes
A boolean
B boolean
dtype: object
df2['A'] & df2['B']
0 True
1 False
2 False
3 <NA>
4 <NA>
5 False
dtype: boolean
In conclusion, please consider upgrading to pandas 1.0 :-)
Upvotes: 7
Reputation: 153510
Let's use np.logical_and
:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':[True, True, False, True, np.nan, np.nan],
'B':[True, False, True, np.nan, np.nan, False]})
s = np.logical_and(df['A'],df['B'])
print(s)
Output:
0 True
1 False
2 False
3 NaN
4 NaN
5 False
Name: A, dtype: object
Upvotes: 10