Mari
Mari

Reputation: 165

How to write np.where the variable equals any in the list, do something?

I am trying to add a column to my dataframe that calculates Delta. Based on the name in the column 'Name', if that name is in the list, it will calculate df['A'] - df['B'], if the name is not in the list, calculation is df['B'] - df['A'].

Here is what I have:

for i in list1:
    
     df['Delta'] = np.where(df['Name'] == i, np.maximum(0, df['A'] - df['B']), np.maximum(0, df['B'] - df['A']))

The problem is that it goes trough each i separately and rewrites all the i's it did before.

How can i rewrite this code, so that it doesn't go through each i, but instead just checks if df['Name'] equals to any of the i's?

Something like:

df['Delta'] = np.where(df['Name'] == any(list1), np.maximum(0, df['A'] - df['B']), np.maximum(0, df['B'] - df['A']))

If there is an overall better way to do this, please let me know.

Upvotes: 1

Views: 1001

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71687

Use Series.isin to create a boolean mask then use np.where along with this mask to select values from choices based on this mask:

diff = df['A'].sub(df['B'])
df['Delta'] = np.where(df['Name'].isin(list1), np.maximum(0, diff), np.maximum(0, -diff))

Example:

np.random.seed(10)

list1 = ['a', 'c']
df = pd.DataFrame({'Name': np.random.choice(['a', 'b', 'c'], 5), 'A': np.random.randint(1, 10, 5), 'B': np.random.randint(1, 10, 5)})

diff = df['A'].sub(df['B'])
df['Delta'] = np.where(df['Name'].isin(list1), np.maximum(0, diff), np.maximum(0, -diff))

Result:

# print(df)
  Name  A  B  Delta
0    b  1  7      6
1    b  2  5      3
2    a  9  4      5
3    a  1  1      0
4    b  9  5      0

Upvotes: 1

Related Questions