Reputation: 165
I am trying to add a column to my dataframe that calculates Delta. Based on the name in the column 'Name', if that name is in the list, it will calculate df['A'] - df['B'], if the name is not in the list, calculation is df['B'] - df['A'].
Here is what I have:
for i in list1:
df['Delta'] = np.where(df['Name'] == i, np.maximum(0, df['A'] - df['B']), np.maximum(0, df['B'] - df['A']))
The problem is that it goes trough each i separately and rewrites all the i's it did before.
How can i rewrite this code, so that it doesn't go through each i, but instead just checks if df['Name'] equals to any of the i's?
Something like:
df['Delta'] = np.where(df['Name'] == any(list1), np.maximum(0, df['A'] - df['B']), np.maximum(0, df['B'] - df['A']))
If there is an overall better way to do this, please let me know.
Upvotes: 1
Views: 1001
Reputation: 71687
Use Series.isin
to create a boolean mask then use np.where
along with this mask to select values from choices based on this mask:
diff = df['A'].sub(df['B'])
df['Delta'] = np.where(df['Name'].isin(list1), np.maximum(0, diff), np.maximum(0, -diff))
Example:
np.random.seed(10)
list1 = ['a', 'c']
df = pd.DataFrame({'Name': np.random.choice(['a', 'b', 'c'], 5), 'A': np.random.randint(1, 10, 5), 'B': np.random.randint(1, 10, 5)})
diff = df['A'].sub(df['B'])
df['Delta'] = np.where(df['Name'].isin(list1), np.maximum(0, diff), np.maximum(0, -diff))
Result:
# print(df)
Name A B Delta
0 b 1 7 6
1 b 2 5 3
2 a 9 4 5
3 a 1 1 0
4 b 9 5 0
Upvotes: 1