John Taylor
John Taylor

Reputation: 737

Different calculations for this month compared to previous months in Pandas Dataframe

Consider sample data:

Month    Members
JUL      10
AUG      10
SEP      10

I want to add a new column which is MemberValue, but I want the column to multiply the Members value by 10 if the month is NOT the current month (currently September 2021) and 100 if the Month value is the current month. The expected output would be like this:

Month    Members   MemberValue
JUL      10        100
AUG      10        100
SEP      10        1000

I have tried various versions of conditionals and continue to the "truth value is ambiguous" which we have all seen before in various contexts. I can calculate the three letter abbreviation of the month and the numerical value of the month (i.e. SEP or 9) but using those as comparisons for calculating the MemberValue column yields the error. I am sure I am missing something simple, but cannot crack this one. Thanks for taking a look.

My latest attempt which failed:

if df.index != months-1:
    df['MemberValue'] = df['Members'] * 10
else:
    df['MemberValue'] = df['Members'] * 100

Another previous attempt:

cur_month_name = str(today_date_time.strftime('%b')).upper()
if df['Month'] != cur_month_name:
        df['MemberValue'] = df['Members'] * 10
    else:
        df['MemberValue'] = df['Members'] * 100

Also failed.

Upvotes: 0

Views: 503

Answers (2)

Corralien
Corralien

Reputation: 120439

You can use strftime to get abbreviated month name and np.where to apply your operation:

cur_month_name = pd.Timestamp.today().strftime('%b').upper()
df['MemberValue'] = np.where(df['Month'] == cur_month_name,
                             df['Members']*100, df['Members']*10)

Output:

>>> df
  Month  Members  MemberValue
0   JUL       10          100
1   AUG       10          100
2   SEP       10         1000

>>> cur_month_name
'SEP'

For your previous attempt, your idea is good to use month name but you ask Python to make a test to a list of value (True / False). To do that, you need to apply the test on each row and not on whole series:

>>> df.apply(lambda x: x['Members']*100 if x['Month'] == cur_month_name
                                        else x['Members']*10, axis=1)

0     100
1     100
2    1000
dtype: int64

Upvotes: 0

Roim
Roim

Reputation: 3066

curr_month_short = str.upper(pd.Timestamp.now().month_name())[0:3]
df.loc[df['Month'] != curr_month_short, 'MemberValue'] = df['Members'] * 10 
df.loc[df['Month'] == curr_month_short, 'MemberValue'] = df['Members'] * 100 

output:

Out[13]: 
  Month  Members  MemberValue
0   JUL       10        100.0
1   AUG       10        100.0
2   SEP       10       1000.0

The first line gets the short name of the month - by simply getting today's month name and then slicing the first 3 letters (and applying str.upper) to them.

df.loc[df['Month'] != curr_month_short, 'MemberValue'] select all the rows where df['Month'] is different than curr_month_short, and assigning to the column MemberValue the value of df['Members'] * 10. Same thing to all rows where the current month is same curr_month_short

As for your code: notice the line df['Month'] != cur_month_name returns a series with boolean values - true or false for each row. if statement doesn't know what to do with it, hence the error. if... else logic can work well if you apply it over a single row at a time (let's say, looping over all row).

In my example, using this condition over .loc is what you intended it to do: take only the rows where the statement is "true", and apply to them the value.

Your code should look like this:

cur_month_name = str(today_date_time.strftime('%b')).upper()
boolean_series = df['Month'] != cur_month_name
df.loc[boolean_series, 'MemberValue'] = df['Members'] * 10
df.loc[~boolean_series, 'MemberValue'] = df['Members'] * 100

(the '~' operator returns the opposite: true turns into false and false into true)

Upvotes: 1

Related Questions