user3097421
user3097421

Reputation: 151

Using Pandas, how do I use if/then logic against two columns to fill in another column?

I have a master spreadsheet with the following:

ClientID    Region    Number Value
61661       AMER      38150
1212        EMEA      2012
6543        AMER      38800
3115        LA        40121
3119        AMER      1000

What I want to do is add a new column to answer the question "For those in the region column with AMER or LA and the Number Value > 31,000, put the word "High" in the new column. Otherwise, if the region is AMER or LA and it's less than 31,000, put "Low". So, the new spreadsheet should look like this:

ClientID    Region    Number Value  New Column  
61661       AMER      38150         High
1212        EMEA      2012          
6543        AMER      38800         High
3115        LA        40121         High
3119        AMER      1000          Low

I took a look at https://chrisalbon.com/python/pandas_selecting_rows_on_conditions.html to get somewhat of an idea of how to do this, I tried the following:

df = pd.ExcelFile('Check.xlsx')
Master = df.parse('Clients')
HighNumber= Master['Number Value'] > 31000
region=(Master['Region']=='AMER') | (Master['Region']=='LA')

Master['New Column']= Master[HighNumber & region]

but I'm getting

ValueError: Wrong number of items passed 20, placement implies 1

I think it has something to do with me not asking it to loop through the whole dataframe to give me the values, but I'm not entirely sure. Any pointers would be fantastic!

Upvotes: 1

Views: 61

Answers (1)

piRSquared
piRSquared

Reputation: 294516

use nested np.where

import numpy as np

isin = df.Region.isin(['AMER', 'LA'])
nv = df['Number Value'] > 31000

df['New Column'] = np.where(isin & nv, 'High', np.where(isin & ~nv, 'Low', ''))

print(df)

   ClientID Region  Number Value New Column
0     61661   AMER         38150       High
1      1212   EMEA          2012           
2      6543   AMER         38800       High
3      3115     LA         40121       High
4      3119   AMER          1000        Low

We can improve performance by utilizing the underlying numpy arrays

isin = df.Region.isin(['AMER', 'LA']).values
nv = df['Number Value'].values > 31000

df['New Column'] = np.where(isin & nv, 'High', np.where(isin & ~nv, 'Low', ''))

timing

enter image description here

response to comments

isin = df.Region.isin(['AMER', 'LA']).values
nv = df['Number Value'].values > 31000
em = df.Region.values == 'EMEA'

df['New Column'] = np.where(
    isin & nv, 'High',
    np.where(
        isin & ~nv, 'Low',
        np.where(em, 'See SS 2', '')
    ))

   ClientID Region  Number Value New Column
0     61661   AMER         38150       High
1      1212   EMEA          2012   See SS 2
2      6543   AMER         38800       High
3      3115     LA         40121       High
4      3119   AMER          1000        Low

Upvotes: 1

Related Questions