bdn711
bdn711

Reputation: 27

Creating new column containing Booleans or NaN based on values in other columns in the same pandas dataframe

I want to create a new column in a pandas dataframe that evaluates to either True, False, or NaN depending on values found in two other columns in the same dataframe, which also only contain either True, False, or NaN values. Specifically, as shown below, row values in the new column should be: (a) True if either of the reference column values is True, (b) False if both reference column values are False or if they are a combination of False and NaN, or (c) NaN if both reference column values are NaN.

col_A + col_B => new_col (desired values)

Initially, as shown below, I tried assigning values to the new column where either col_A or col_B were True, but understandably it returned False whenever one or both columns contained NaN.

df[new_col] = df[col_A] | df[col_B]

I also figured out how to create a column that returns True for rows where both col_A and col_B are NaN, but I'm still struggling to figure out the next step.

df[new_col] = pd.isnull(df[col_A]) & pd.isnull(df[col_B]) 

I feel like I could eventually get the correct output using a for loop/if statements, but my understanding is that this would be super inefficient and it seems like there should be a more efficient/straightforward way to get the result I'm looking for.

Upvotes: 1

Views: 2372

Answers (1)

run-out
run-out

Reputation: 3184

Use .any

Set up the dataframe:

dict = {
    'col_A': [True, True, True, False, False, False, np.NaN, np.NaN, np.NaN], 
    'col_B': [True, False, np.NaN, False, True, np.NaN, np.NaN,True, False]   
       }
df = pd.DataFrame(dict)

print(df)

   col_A  col_B
0   True   True
1   True  False
2   True    NaN
3  False  False
4  False   True
5  False    NaN
6    NaN    NaN
7    NaN   True
8    NaN  False

Create a new column and set all values to false. We will fill in the other values.

df['new_col'] = False

Use .any() to fill in the remaining True and NaN values.

df.loc[df[['col_A', 'col_B']].any(1), 'new_col'] = True

print(df)

   col_A   col_B   new_col
0   True   True     True
1   True  False     True
2   True    NaN     True
3  False  False    False
4  False   True     True
5  False    NaN    False
6    NaN    NaN    False
7    NaN   True     True
8    NaN  False    False

Upvotes: 1

Related Questions