Vicki B
Vicki B

Reputation: 584

Set a pandas column Boolean value based on other columns in the row

Assume a DataFrame

    C1      C2      C3
1   NaN     NaN     NaN
2   20.1    15      200
3   NaN     12      100
4   22.5    8       80

I want to create a new column based on a summarizing boolean of the rest of the row. For example, are any of the values NaN? In that case, my new column value would be "False" for that row.

Or, perhaps, are ALL of the values NaN? In that case, I might want the new column to say False but otherwise True (we do have some values)

I considered using df.notnan() to create a Boolean DataFrame,

    C1      C2      C3
1   False   False   False
2   True    True    True
3   False   True    True
4   True    True    True

I'm sure I'm just missing something simple, but I could not come up with a way to create the fourth column based on OR-ing the existing items in each row.

Also, a generic solution would be nice, one that doesn't require building an interim DF of Booleans.

Background: I have a dataset. Nutrient values are only sampled occasionally, so many of the rows do not contain those values. I would like to have a "Nutrients Sampled" column where the value is True or False based on whether I can expect to see any nutrient sample data in this record. There are 6 possible nutrients and I don't want to check all 6 columns.

I can write the code that checks all 6 columns; I just can't seem to create a new column with a truth value.

Upvotes: 1

Views: 10971

Answers (4)

stahamtan
stahamtan

Reputation: 858

You can use apply method and define a function to map rows to a boolean.

Here is a function, you can customize based on your need (e.g. you can use all instead of any):

# if at least one of the values is NaN
def my_function(row):
    return any(row[['C1', 'C2', 'C3']].isna())

And here is how to apply it to your dataframe and add new column:

df['new_column'] = df.apply(my_function, axis=1)

    C1      C2      C3      new_column
0   NaN     NaN     NaN     True
1   20.1    15.0    200.0   False
2   NaN     12.0    100.0   True
3   22.5    8.0     80.0    False

Upvotes: 1

BENY
BENY

Reputation: 323226

I feel like we should using all

df['New']=~df.isna().all(1)
df
     C1    C2     C3    New
1   NaN   NaN    NaN  False
2  20.1  15.0  200.0   True
3   NaN  12.0  100.0   True
4  22.5   8.0   80.0   True

Upvotes: 2

Haleemur Ali
Haleemur Ali

Reputation: 28253

You can do that using any and all methods which are available on the data frame, just have to pass the argument axis=1 to operate along

example:

df['C4'] = pd.notnull(df).any(axis=1)

     C1    C2     C3     C4
0   NaN   NaN    NaN  False
1  20.1  15.0  200.0   True
2   NaN  12.0  100.0   True
3  22.5   8.0   80.0   True

Upvotes: 2

Edward Aung
Edward Aung

Reputation: 3512

How about:

# interim df
df = {"C1": [False, True, False, True], ...
df ["C4"] = df.apply(lambda x: x.C1 or x.C2 or X.C3, axis=1)

Or ... directly as

original_df["C4"] = original_df.apply(lambda x: np.any(np.isnan(x)), axis = 1)

Regards,

Upvotes: 0

Related Questions