Reputation: 584
Assume a DataFrame
C1 C2 C3
1 NaN NaN NaN
2 20.1 15 200
3 NaN 12 100
4 22.5 8 80
I want to create a new column based on a summarizing boolean of the rest of the row. For example, are any of the values NaN? In that case, my new column value would be "False" for that row.
Or, perhaps, are ALL of the values NaN? In that case, I might want the new column to say False but otherwise True (we do have some values)
I considered using df.notnan()
to create a Boolean DataFrame,
C1 C2 C3
1 False False False
2 True True True
3 False True True
4 True True True
I'm sure I'm just missing something simple, but I could not come up with a way to create the fourth column based on OR-ing the existing items in each row.
Also, a generic solution would be nice, one that doesn't require building an interim DF of Booleans.
Background: I have a dataset. Nutrient values are only sampled occasionally, so many of the rows do not contain those values. I would like to have a "Nutrients Sampled" column where the value is True or False based on whether I can expect to see any nutrient sample data in this record. There are 6 possible nutrients and I don't want to check all 6 columns.
I can write the code that checks all 6 columns; I just can't seem to create a new column with a truth value.
Upvotes: 1
Views: 10971
Reputation: 858
You can use apply
method and define a function to map rows to a boolean.
Here is a function, you can customize based on your need (e.g. you can use all
instead of any
):
# if at least one of the values is NaN
def my_function(row):
return any(row[['C1', 'C2', 'C3']].isna())
And here is how to apply it to your dataframe and add new column:
df['new_column'] = df.apply(my_function, axis=1)
C1 C2 C3 new_column
0 NaN NaN NaN True
1 20.1 15.0 200.0 False
2 NaN 12.0 100.0 True
3 22.5 8.0 80.0 False
Upvotes: 1
Reputation: 323226
I feel like we should using all
df['New']=~df.isna().all(1)
df
C1 C2 C3 New
1 NaN NaN NaN False
2 20.1 15.0 200.0 True
3 NaN 12.0 100.0 True
4 22.5 8.0 80.0 True
Upvotes: 2
Reputation: 28253
You can do that using any
and all
methods which are available on the data frame, just have to pass the argument axis=1
to operate along
example:
df['C4'] = pd.notnull(df).any(axis=1)
C1 C2 C3 C4
0 NaN NaN NaN False
1 20.1 15.0 200.0 True
2 NaN 12.0 100.0 True
3 22.5 8.0 80.0 True
Upvotes: 2
Reputation: 3512
How about:
# interim df
df = {"C1": [False, True, False, True], ...
df ["C4"] = df.apply(lambda x: x.C1 or x.C2 or X.C3, axis=1)
Or ... directly as
original_df["C4"] = original_df.apply(lambda x: np.any(np.isnan(x)), axis = 1)
Regards,
Upvotes: 0