Reputation: 133
I would like to make a new column based on an if statement that has conditionals of two or more other columns in a dataframe.
For example, column3 = True if (column1 < 10.0) and (column2 > 0.0).
I have looked around and it seems that other have used the apply method with a lambda function, but i am a bit of a novice on these.
I suppose i could make two additional columns that makes that row a 1 if the condition is met for each column, then sum the columns to check if all conditions are met, but this seems a bit inelegant.
If you provide an answer with apply/lambda, let's suppose the dataframe is called sample_df and the columns are col1, col2, and col3.
Thanks so much!
Upvotes: 4
Views: 10481
Reputation: 6663
You can use eval
here for short:
# create some dummy data
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 2)),
columns=["col1", "col2"])
print(df)
col1 col2
0 1 7
1 2 3
2 4 6
3 2 5
4 5 4
df["col3"] = df.eval("col1 < 5 and col2 > 5")
print(df)
col1 col2 col3
0 1 7 True
1 2 3 False
2 4 6 True
3 2 5 False
4 5 4 False
You can also write it without eval via (df["col1"] < 5) & (df["col2"] > 5)
.
You may also enhance the example with np.where
to explicitly set the values for the positive and negative cases right away:
df["col4"] = np.where(df.eval("col1 < 5 and col2 > 5"), "Positive Value", "Negative Value")
print(df)
col1 col2 col3 col4
0 1 7 True Positive Value
1 2 3 False Negative Value
2 4 6 True Positive Value
3 2 5 False Negative Value
4 5 4 False Negative Value
Upvotes: 2