MrT
MrT

Reputation: 774

Concise way updating values based on column values

Background: I have a DataFrame whose values I need to update using some very specific conditions. The original implementation I inherited used a lot nested if statements wrapped in for loop, obfuscating what was going on. With readability primarily in mind, I rewrote it into this:

# Other Widgets 
df.loc[(
    (df.product == 0) & 
    (df.prod_type == 'OtherWidget') & 
    (df.region == 'US') 
    ), 'product'] = 5

# Supplier X - All clients
df.loc[(
    (df.product == 0) &
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'X')
    ), 'product'] =  6

# Supplier Y - Client A 
df.loc[(
    (df.product == 0) & 
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'Y') & 
    (df.client == 'A')
    ), 'product'] =  1        

# Supplier Y - Client B
df.loc[(
    (df.product == 0) & 
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'Y') & 
    (df.client == 'B')
    ), 'product'] =  3

# Supplier Y - Client C
df.loc[(
    (df.product == 0) & 
    (df.region.isin(['UK','US'])) &
    (df.supplier == 'Y') & 
    (df.client == 'C')
    ), 'product'] =  4

Problem: This works well, and makes the conditions clear (in my opinion), but I'm not entirely happy because it's taking up a lot of space. Is there anyway to improve this from a readability/conciseness perspective?

Upvotes: 1

Views: 52

Answers (1)

MrT
MrT

Reputation: 774

Per EdChum's recommendation, I created a mask for the conditions. The code below goes a bit overboard in terms of masking, but it gives the general sense.

prod_0   = ( df.product == 0 )
ptype_OW = ( df.prod_type == 'OtherWidget' )
rgn_UKUS = ( df.region.isin['UK', 'US'] )
rgn_US   = ( df.region == 'US' )
supp_X   = ( df.supplier == 'X' )
supp_Y   = ( df.supplier == 'Y' )
clnt_A   = ( df.client == 'A' )
clnt_B   = ( df.client == 'B' )
clnt_C   = ( df.client == 'C' )

df.loc[(prod_0 & ptype_OW & reg_US), 'prod_0']          = 5
df.loc[(prod_0 & rgn_UKUS & supp_X), 'prod_0']          = 6
df.loc[(prod_0 & rgn_UKUS & supp_Y & clnt_A), 'prod_0'] = 1
df.loc[(prod_0 & rgn_UKUS & supp_Y & clnt_B), 'prod_0'] = 3
df.loc[(prod_0 & rgn_UKUS & supp_Y & clnt_C), 'prod_0'] = 4

Upvotes: 1

Related Questions