Nathan Przybylo
Nathan Przybylo

Reputation: 65

Creating a column based on multiple conditions

I'm a longtime SAS user trying to get into Pandas. I'd like to set a column's value based on a variety of if conditions. I think I can do it using nested np.where commands but thought I'd check if there's a more elegant solution. For instance, if I set a left bound and right bound, and want to return a column of string values for if x is left, middle, or right of these boundaries, what is the best way to do it? Basically if x < lbound return "left", else if lbound < x < rbound return "middle", else if x > rbound return "right".

df
   lbound   rbound  x
0   -1      1       0
1   5       7       1
2   0       1       2

Can check for one condition by using np.where:

df['area'] = np.where(df['x']>df['rbound'],'right','somewhere else')

But not sure what to do it I want to check multiple if-else ifs in a single line.

Output should be:

df
   lbound   rbound  x    area
0   -1      1       0    middle
1   5       7       1    left
2   0       1       2    right

Upvotes: 5

Views: 1452

Answers (2)

Vaishali
Vaishali

Reputation: 38415

You can use numpy select instead of np.where

cond = [df['x'].between(df['lbound'], df['rbound']), (df['x'] < df['lbound']) , df['x'] > df['rbound'] ]
output = [ 'middle', 'left', 'right']

df['area'] = np.select(cond, output, default=np.nan)



    lbound  rbound  x   area
0   -1      1       0   middle
1   5       7       1   left
2   0       1       2   right

Upvotes: 2

jpp
jpp

Reputation: 164773

Option 1

You can use nested np.where statements. For example:

df['area'] = np.where(df['x'] > df['rbound'], 'right', 
                      np.where(df['x'] < df['lbound'],
                               'left', 'somewhere else'))

Option 2

You can use .loc accessor to assign specific ranges. Note you will have to add the new column before use. We take this opportunity to set the default, which may be overwritten later.

df['area'] = 'somewhere else'
df.loc[df['x'] > df['rbound'], 'area'] = 'right'
df.loc[df['x'] < df['lbound'], 'area'] = 'left'

Explanation

These are both valid alternatives with comparable performance. The calculations are vectorised in both instances. My preference is for Option 2 as it seems more readable. If there are a large number of nested criteria, np.where may be more convenient.

Upvotes: 6

Related Questions