Reputation: 727
I want to make a new column based on two variables. I want my new column to have the value "Good" if (column 1 >= .5 or column 2 < 0.5) and (column 1 < .5 or column 2 >= 0.5) otherwise "Bad".
I tried using lambda
and if
.
df["new column"] = df[["column 1", "column 2"]].apply(
lambda x, y: "Good" if (x >= 0.5 or y < 0.5) and (x < 0.5 or y >= 0.5) else "Bad"
)
Got
TypeError: ("() missing 1 required positional argument: 'y'", 'occurred at index column 1')
Upvotes: 1
Views: 2672
Reputation: 1
You just need to reference the columns by their index in the array you are passing the the lambda expression, like this:
df["new column"] = df[["column 1", "column 2"]].apply(
lambda x: "Good" if (x[0] >= 0.5 or x[1] < 0.5) and (x[0] < 0.5 or x[1] >= 0.5) else "Bad", axis=1
)
NOTE: don't forget to include axis=1
Upvotes: 0
Reputation: 153460
Use np.where
, pandas does intrinsic data alignment, meaning you don't need to use apply or iterate row by row, pandas will align the data on index:
df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
Using @YunaA. setup....
import pandas as pd
df = pd.DataFrame({'x': [1, 2, 0.1, 0.1],
'y': [1, 2, 0.7, 0.2],
'column 3': [1, 2, 3, 4]})
df['new column'] = df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5)) & ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
df
Output:
x y column 3 new column
0 1.0 1.0 1 Good
1 2.0 2.0 2 Good
2 0.1 0.7 3 Bad
3 0.1 0.2 4 Good
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'x':np.random.random(100)*2,
'y': np.random.random(100)*1})
def update_column(row):
if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
return "Good"
return "Bad"
Results
%timeit df['new column'] = np.where(((df['y'] <= .5) | (df['x'] > .5))
& ((df['x'] < .5) | (df['y'] >= .5)), 'Good', 'Bad')
1.45 ms ± 72.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df['new_column'] = df.apply(update_column, axis=1)
5.83 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 5
Reputation: 36
Try this:
import pandas as pd
def update_column(row):
if (row['x'] >= .5 or row['y'] <= .5) and (row['x'] < .5 or row['y'] >= .5):
return "Good"
return "Bad"
df['new_column'] = df.apply(update_column, axis=1)
Upvotes: 2
Reputation: 149
Pass the row into the lambda instead.
df['new column'] = df[['column 1', 'column 2']].apply(lambda row: "Good" if (row['column 1'] >= .5 or row['column 2'] < .5) and (row['column 1'] < .5 or row['column 2'] >= .5) else "Bad", axis=1)
Example:
import pandas as pd
df = pd.DataFrame({'column 1': [1, 2, 0.1, 0.1],
'column 2': [1, 2, 0.7, 0.2],
'column 3': [1, 2, 3, 4]})
df['new column'] = df[['column 1', 'column 2']].apply(lambda row: "Good" if (row['column 1'] >= .5 or row['column 2'] < .5) and (row['column 1'] < .5 or row['column 2'] >= .5) else "Bad", axis=1)
print(df)
Output:
column 1 column 2 column 3 new column
0 1.0 1.0 1 Good
1 2.0 2.0 2 Good
2 0.1 0.7 3 Bad
3 0.1 0.2 4 Good
Upvotes: 2