Reputation: 151
I have a master spreadsheet with the following:
ClientID Region Number Value
61661 AMER 38150
1212 EMEA 2012
6543 AMER 38800
3115 LA 40121
3119 AMER 1000
What I want to do is add a new column to answer the question "For those in the region column with AMER or LA and the Number Value > 31,000, put the word "High" in the new column. Otherwise, if the region is AMER or LA and it's less than 31,000, put "Low". So, the new spreadsheet should look like this:
ClientID Region Number Value New Column
61661 AMER 38150 High
1212 EMEA 2012
6543 AMER 38800 High
3115 LA 40121 High
3119 AMER 1000 Low
I took a look at https://chrisalbon.com/python/pandas_selecting_rows_on_conditions.html to get somewhat of an idea of how to do this, I tried the following:
df = pd.ExcelFile('Check.xlsx')
Master = df.parse('Clients')
HighNumber= Master['Number Value'] > 31000
region=(Master['Region']=='AMER') | (Master['Region']=='LA')
Master['New Column']= Master[HighNumber & region]
but I'm getting
ValueError: Wrong number of items passed 20, placement implies 1
I think it has something to do with me not asking it to loop through the whole dataframe to give me the values, but I'm not entirely sure. Any pointers would be fantastic!
Upvotes: 1
Views: 61
Reputation: 294516
use nested np.where
import numpy as np
isin = df.Region.isin(['AMER', 'LA'])
nv = df['Number Value'] > 31000
df['New Column'] = np.where(isin & nv, 'High', np.where(isin & ~nv, 'Low', ''))
print(df)
ClientID Region Number Value New Column
0 61661 AMER 38150 High
1 1212 EMEA 2012
2 6543 AMER 38800 High
3 3115 LA 40121 High
4 3119 AMER 1000 Low
We can improve performance by utilizing the underlying numpy arrays
isin = df.Region.isin(['AMER', 'LA']).values
nv = df['Number Value'].values > 31000
df['New Column'] = np.where(isin & nv, 'High', np.where(isin & ~nv, 'Low', ''))
timing
response to comments
isin = df.Region.isin(['AMER', 'LA']).values
nv = df['Number Value'].values > 31000
em = df.Region.values == 'EMEA'
df['New Column'] = np.where(
isin & nv, 'High',
np.where(
isin & ~nv, 'Low',
np.where(em, 'See SS 2', '')
))
ClientID Region Number Value New Column
0 61661 AMER 38150 High
1 1212 EMEA 2012 See SS 2
2 6543 AMER 38800 High
3 3115 LA 40121 High
4 3119 AMER 1000 Low
Upvotes: 1