ilovewt
ilovewt

Reputation: 1023

Want to find Year on Year calculation using Groupby and apply for various years

I have a dataframe as follows:

    MARKET     PRODUCT  TIMEPERIOD  DATE    VALUES
0   USA MARKET  APPLE   QUARTER 2020-06-01  100
1   USA MARKET  APPLE   YEARLY  2020-06-01  1000
2   USA MARKET  PEAR    QUARTER 2020-06-01  200
3   USA MARKET  PEAR    YEARLY  2020-06-01  5000
4   USA MARKET  APPLE   QUARTER 2019-06-01  300
5   USA MARKET  PEAR    YEARLY  2019-06-01  2000
6   USA MARKET  PEAR    QUARTER 2019-06-01  100
7   USA MARKET  PEAR    YEARLY  2019-06-01  3000
8   USA MARKET  APPLE   QUARTER 2018-06-01  300
9   USA MARKET  PEAR    YEARLY  2018-06-01  2000
10  USA MARKET  PEAR    QUARTER 2018-06-01  100
11  USA MARKET  PEAR    YEARLY  2018-06-01  3000
12  UK MARKET   WATERMELON  QUARTER 2020-06-01  200
13  UK MARKET   WATERMELON  YEARLY  2020-06-01  5000
14  UK MARKET   GRAPE   QUARTER 2020-06-01  200
15  UK MARKET   GRAPE   YEARLY  2020-06-01  5000
16  UK MARKET   WATERMELON  QUARTER 2019-06-01  500
17  UK MARKET   WATERMELON  YEARLY  2019-06-01  300
18  UK MARKET   GRAPE   QUARTER 2019-06-01  50
19  UK MARKET   GRAPE   YEARLY  2019-06-01  500
20  UK MARKET   WATERMELON  QUARTER 2018-06-01  500
21  UK MARKET   WATERMELON  YEARLY  2018-06-01  300
22  UK MARKET   GRAPE   QUARTER 2018-06-01  50
23  UK MARKET   GRAPE   YEARLY  2018-06-01  500

I want to find the Year on Year difference of each product in each market for each timeperiod (that's a mouthful!) For example, for product APPLE in USA MARKET during the TIMEPERIOD Quarter, the 2020-06-01 growth rate is simply (100-300)/300 = 66.6% where I used the values of 2020-06-01 minus 2019-06-01 divided by 2019-06-01.

As you can see, the problem with the below code is that it only returns the growth rate of the current year vs the past year. And ignores the past year 2019 vs 2018. I have tried a few if-else blocks, but all seem to point to some errors, would appreciate it if there are any neat solutions to resolve this. In short, my growth_rate_prev is unused here (although I did try to weave it in but it failed).

def year_on_year(df):    
    try:
        curr_year_val = df[df['DATE']==max(df['DATE'])]['VALUES'].sum() 
        prev_year_val = df[df['DATE']==(max(df['DATE'])-relativedelta(months=12))]['VALUES'].sum()
        prev_prev_year_val = df[df['DATE']==(df(df['DATE'])-relativedelta(months=24))]['VALUES'].sum()
        
        growth_rate_curr = ((curr_year_val)-(prev_year_val))/(prev_year_val)
        growth_rate_prev = ((prev_year_val)-(prev_prev_year_val))/(prev_prev_year_val)
        
        
    except ZeroDivisionError:
        growth_rate_curr, growth_rate_prev = 0 , 0

        
    return growth_rate_curr


    
def product_growth(applied_group_df):            
        applied_group_df['Year on Year difference'] = year_on_year(applied_group_df)
        return applied_group_df

growth_rate_df = df_2.groupby(["TIMEPERIOD",'MARKET', 'PRODUCT']).apply(product_growth) 

In case anyone wants to reproduce the code, you can create the df by using the code below:

df_list_for_yoy = [['USA MARKET', 'APPLE', 'QUARTER', '2020-06-01', 100], ['USA MARKET', 'APPLE', 'YEARLY', '2020-06-01', 1000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2020-06-01', 200],  ['USA MARKET', 'PEAR', 'YEARLY', '2020-06-01', 5000], 
           ['USA MARKET', 'APPLE', 'QUARTER', '2019-06-01', 300],  ['USA MARKET', 'APPLE', 'YEARLY', '2019-06-01', 2000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2019-06-01', 100],  ['USA MARKET', 'PEAR', 'YEARLY', '2019-06-01', 3000],
           ['USA MARKET', 'APPLE', 'QUARTER', '2018-06-01', 300],  ['USA MARKET', 'APPLE', 'YEARLY', '2018-06-01', 2000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2018-06-01', 100],  ['USA MARKET', 'PEAR', 'YEARLY', '2018-06-01', 3000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 200],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 5000], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2020-06-01', 200],    ['UK MARKET', 'GRAPE', 'YEARLY', '2020-06-01', 5000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2019-06-01', 500],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2019-06-01', 300], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2019-06-01', 50],    ['UK MARKET', 'GRAPE', 'YEARLY', '2019-06-01', 500],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2018-06-01', 500],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2018-06-01', 300], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2018-06-01', 50],    ['UK MARKET', 'GRAPE', 'YEARLY', '2018-06-01', 500]]

column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'DATE', 'VALUES']
df_2 = pd.DataFrame(df_list_for_yoy, columns = column_names)
df_2['DATE']= pd.to_datetime(df_2['DATE'])

Upvotes: 1

Views: 120

Answers (3)

ilovewt
ilovewt

Reputation: 1023

I made some changes to Caina Max's answer to accommodate to my real data, where there are various months in a year. There can be 2020-06-01, 2020-03-01, 2019-12-01 etc and hence I have to make the below changes to get the combination pairs of the dates who are 1 year apart exactly, namely, [2019-06-01, 2020-06-01], [2019-03-01, 2020-03-01], [2018-12-01, 2019-12-01] etc etc.

import numpy as np
import pandas as pd
from itertools import combinations

def get_annual_growth(grp):
    # Get all possible combination of the years from dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE, 2)])
    new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]

    # Get year-combination labels
    year_comb_strings = [comb[1] for comb in new_year_comb_lists]
    
    # Create sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
    for i,years in enumerate(new_year_comb_lists ): # for each year combination ...
        actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
        growth = (actual_value - last_value) / last_value # calculate the annual growth
        subdf.loc[i, :] = [year_comb_strings[i], growth] 
    return subdf

df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])

Upvotes: 1

sharathnatraj
sharathnatraj

Reputation: 1614

Please find this approach.

df = df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()
def func(x):
    year = 2021
    for i in range(1,len(x['VALUES'])):
        colname = str(year-i) + '-Growth'
        x[colname] = round(abs(x['VALUES'][i]- x['VALUES'][i-1])/x['VALUES'][i]*100,2)
    return x
df = df.apply(lambda x: func(x), axis=1).drop('VALUES',axis=1)
print(df)

Its a generalized code which should work for all previous years dating back to 2013 as mentioned in the comment.

Prints:

       MARKET TIMEPERIOD     PRODUCT  2020-Growth  2019-Growth
0   UK MARKET    QUARTER       GRAPE       300.00          0.0
1   UK MARKET    QUARTER  WATERMELON        60.00          0.0
2   UK MARKET     YEARLY       GRAPE       900.00          0.0
3   UK MARKET     YEARLY  WATERMELON      1566.67          0.0
4  USA MARKET    QUARTER       APPLE        66.67          0.0
5  USA MARKET    QUARTER        PEAR       100.00          0.0
6  USA MARKET     YEARLY       APPLE        50.00          0.0
7  USA MARKET     YEARLY        PEAR        66.67          0.0

Explanation:

First, I do a group by on the values and put it into a list:

df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()

for e.g

       MARKET TIMEPERIOD     PRODUCT              VALUES
0   UK MARKET    QUARTER       GRAPE       [200, 50, 50]
1   UK MARKET    QUARTER  WATERMELON     [200, 500, 500]
....

Then, I write an apply to loop through the 'VALUES' list column and do the growth calculation.

Upvotes: 1

You can use itertools.combinations to get the year-year combination, together with further manipulation inside a function to be applied in the groups, like this:

import numpy as np
import pandas as pd
from itertools import combinations

def get_annual_growth(grp):
    # Get all possible combination of the years from dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE.dt.year, 2)])
    # Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
    year_comb_lists = year_comb_lists[(np.diff(year_comb_lists) == 1).flatten()] # comment this line if it's not the case
    # Get year-combination labels
    year_comb_strings = ['-'.join(map(str, comb)) for comb in year_comb_lists]
    
    # Create sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
    for i,years in enumerate(year_comb_lists): # for each year combination ...
        actual_value, last_value = grp[grp.DATE.dt.year==years[1]].VALUES.mean(), grp[grp.DATE.dt.year==years[0]].VALUES.mean()
        growth = (actual_value - last_value) / last_value # calculate the annual growth
        subdf.loc[i, :] = [year_comb_strings[i], growth] 
    return subdf

df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)

Output:

                                   Annual Reference Annual Growth (%)
TIMEPERIOD MARKET     PRODUCT                                        
QUARTER    UK MARKET  GRAPE      0        2019-2020               300
                                 1        2018-2019                 0
                      WATERMELON 0        2019-2020               -60
                                 1        2018-2019                 0
           USA MARKET APPLE      0        2019-2020            -66.67
                                 1        2018-2019                 0
                      PEAR       0        2019-2020               100
                                 1        2018-2019                 0
YEARLY     UK MARKET  GRAPE      0        2019-2020               900
                                 1        2018-2019                 0
                      WATERMELON 0        2019-2020           1566.67
                                 1        2018-2019                 0
           USA MARKET APPLE      0        2019-2020               -50
                                 1        2018-2019                 0
                      PEAR       0        2019-2020             66.67
                                 1        2018-2019                 0

Upvotes: 2

Related Questions