Gareth
Gareth

Reputation: 43

Code looks "not pythonic" - nested np.where() to add column to pd.dataframe

I have a pd dataframe which includes the columns CompTotal and CompFreq.

I wanted to add a third column- NormalizedAnnualCompensation which uses the following logic If the CompFreq is Yearly then use the exising value in CompTotal If the CompFreq is Monthly then multiply the value in CompTotal with 12 If the CompFreq is Weekly then multiply the value in CompTotal with 52

I eventually used np.where() to basically write a nested if statement like I'm used to bodging together in excel(pretty new to coding in general)- that's below.

My question is- Could I have done it better? This doesn't feel very pythonic based on what I've read and what I've been taught so far.

df['NormalizedAnnualCompensation'] = np.where(df['CompFreq']=='Yearly',df.CompTotal, 
(np.where(df['CompFreq']=='Monthly', df.CompTotal * 12,
(np.where(df['CompFreq']=='Weekly',df.CompTotal *52,'NA')

))))

Thanks in advance.

Upvotes: 4

Views: 77

Answers (2)

SeaBean
SeaBean

Reputation: 23217

np.where() is good for simple if-then-else processing. However, if you have multiple conditions to test, nesting np.where() would look complicated and difficult to read. In this case, you can get cleaner and more readable codes by using np.select(), as follows:

condlist   = [df['CompFreq']=='Yearly', df['CompFreq']=='Monthly', df['CompFreq']=='Weekly']
choicelist = [df.CompTotal,             df.CompTotal * 12,         df.CompTotal * 52]

df['NormalizedAnnualCompensation'] = np.select(condlist, choicelist, default='NA')

Upvotes: 1

Nico Albers
Nico Albers

Reputation: 1696

There is no such thing as the "proper" way to do things, so you already got the correct one!

Still, you can learn for sure by asking for different approaches (while this probably goes beyond the scope of what stackoverflow intents to be).

For example, you may consider using pandas only by using masks and accessing only some specific region of the DataFrames to be set (pd.DataFrame.loc):

df["NormalizedAnnualCompensation"] = "NA"
mask = df["CompFreq"]=="Yearly"
df.loc[mask, "NormalizedAnnualCompensation"] = df.loc[mask, "CompTotal"]
mask = df["CompFreq"]=="Monthly"
df.loc[mask, "NormalizedAnnualCompensation"] = df.loc[mask, "CompTotal"] * 12
mask = df["CompFreq"]=="Weekly"
df.loc[mask, "NormalizedAnnualCompensation"] = df.loc[mask, "CompTotal"] * 52

If you really only want to compare that column for equality and for each of the cases are filling a fixed value (i.e. CompTotal is a constant over the whole dataframe, you could consider simply using pd.Series.map , compare the following minimum example achieving a similar thing:

 In [1]: pd.Series(np.random.randint(4, size=10)).map({0: "zero", 1: "one", 2: "two"}).fillna(
    ...:     "NA"
    ...: )
 Out[1]:
0      NA
1     two
2      NA
3    zero
4     two
5    zero
6     one
7     two
8      NA
9     two
dtype: object

Upvotes: 2

Related Questions