Reputation: 1023
I have a dataframe as follows:
MARKET PRODUCT TIMEPERIOD DATE VALUES
0 USA MARKET APPLE QUARTER 2020-06-01 100
1 USA MARKET APPLE YEARLY 2020-06-01 1000
2 USA MARKET PEAR QUARTER 2020-06-01 200
3 USA MARKET PEAR YEARLY 2020-06-01 5000
4 USA MARKET APPLE QUARTER 2019-06-01 300
5 USA MARKET PEAR YEARLY 2019-06-01 2000
6 USA MARKET PEAR QUARTER 2019-06-01 100
7 USA MARKET PEAR YEARLY 2019-06-01 3000
8 USA MARKET APPLE QUARTER 2018-06-01 300
9 USA MARKET PEAR YEARLY 2018-06-01 2000
10 USA MARKET PEAR QUARTER 2018-06-01 100
11 USA MARKET PEAR YEARLY 2018-06-01 3000
12 UK MARKET WATERMELON QUARTER 2020-06-01 200
13 UK MARKET WATERMELON YEARLY 2020-06-01 5000
14 UK MARKET GRAPE QUARTER 2020-06-01 200
15 UK MARKET GRAPE YEARLY 2020-06-01 5000
16 UK MARKET WATERMELON QUARTER 2019-06-01 500
17 UK MARKET WATERMELON YEARLY 2019-06-01 300
18 UK MARKET GRAPE QUARTER 2019-06-01 50
19 UK MARKET GRAPE YEARLY 2019-06-01 500
20 UK MARKET WATERMELON QUARTER 2018-06-01 500
21 UK MARKET WATERMELON YEARLY 2018-06-01 300
22 UK MARKET GRAPE QUARTER 2018-06-01 50
23 UK MARKET GRAPE YEARLY 2018-06-01 500
I want to find the Year on Year difference of each product in each market for each timeperiod (that's a mouthful!) For example, for product APPLE in USA MARKET during the TIMEPERIOD Quarter, the 2020-06-01 growth rate is simply (100-300)/300 = 66.6% where I used the values of 2020-06-01 minus 2019-06-01 divided by 2019-06-01.
As you can see, the problem with the below code is that it only returns the growth rate of the current year vs the past year. And ignores the past year 2019 vs 2018. I have tried a few if-else
blocks, but all seem to point to some errors, would appreciate it if there are any neat solutions to resolve this. In short, my growth_rate_prev
is unused here (although I did try to weave it in but it failed).
def year_on_year(df):
try:
curr_year_val = df[df['DATE']==max(df['DATE'])]['VALUES'].sum()
prev_year_val = df[df['DATE']==(max(df['DATE'])-relativedelta(months=12))]['VALUES'].sum()
prev_prev_year_val = df[df['DATE']==(df(df['DATE'])-relativedelta(months=24))]['VALUES'].sum()
growth_rate_curr = ((curr_year_val)-(prev_year_val))/(prev_year_val)
growth_rate_prev = ((prev_year_val)-(prev_prev_year_val))/(prev_prev_year_val)
except ZeroDivisionError:
growth_rate_curr, growth_rate_prev = 0 , 0
return growth_rate_curr
def product_growth(applied_group_df):
applied_group_df['Year on Year difference'] = year_on_year(applied_group_df)
return applied_group_df
growth_rate_df = df_2.groupby(["TIMEPERIOD",'MARKET', 'PRODUCT']).apply(product_growth)
In case anyone wants to reproduce the code, you can create the df by using the code below:
df_list_for_yoy = [['USA MARKET', 'APPLE', 'QUARTER', '2020-06-01', 100], ['USA MARKET', 'APPLE', 'YEARLY', '2020-06-01', 1000],
['USA MARKET', 'PEAR', 'QUARTER', '2020-06-01', 200], ['USA MARKET', 'PEAR', 'YEARLY', '2020-06-01', 5000],
['USA MARKET', 'APPLE', 'QUARTER', '2019-06-01', 300], ['USA MARKET', 'APPLE', 'YEARLY', '2019-06-01', 2000],
['USA MARKET', 'PEAR', 'QUARTER', '2019-06-01', 100], ['USA MARKET', 'PEAR', 'YEARLY', '2019-06-01', 3000],
['USA MARKET', 'APPLE', 'QUARTER', '2018-06-01', 300], ['USA MARKET', 'APPLE', 'YEARLY', '2018-06-01', 2000],
['USA MARKET', 'PEAR', 'QUARTER', '2018-06-01', 100], ['USA MARKET', 'PEAR', 'YEARLY', '2018-06-01', 3000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 200], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 5000],
['UK MARKET', 'GRAPE', 'QUARTER', '2020-06-01', 200], ['UK MARKET', 'GRAPE', 'YEARLY', '2020-06-01', 5000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2019-06-01', 500], ['UK MARKET', 'WATERMELON', 'YEARLY', '2019-06-01', 300],
['UK MARKET', 'GRAPE', 'QUARTER', '2019-06-01', 50], ['UK MARKET', 'GRAPE', 'YEARLY', '2019-06-01', 500],
['UK MARKET', 'WATERMELON', 'QUARTER', '2018-06-01', 500], ['UK MARKET', 'WATERMELON', 'YEARLY', '2018-06-01', 300],
['UK MARKET', 'GRAPE', 'QUARTER', '2018-06-01', 50], ['UK MARKET', 'GRAPE', 'YEARLY', '2018-06-01', 500]]
column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'DATE', 'VALUES']
df_2 = pd.DataFrame(df_list_for_yoy, columns = column_names)
df_2['DATE']= pd.to_datetime(df_2['DATE'])
Upvotes: 1
Views: 120
Reputation: 1023
I made some changes to Caina Max's answer to accommodate to my real data, where there are various months in a year. There can be 2020-06-01, 2020-03-01, 2019-12-01 etc and hence I have to make the below changes to get the combination pairs of the dates who are 1 year apart exactly, namely, [2019-06-01, 2020-06-01], [2019-03-01, 2020-03-01], [2018-12-01, 2019-12-01] etc etc.
import numpy as np
import pandas as pd
from itertools import combinations
def get_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE, 2)])
new_year_comb_lists = [comb_dates for comb_dates in year_comb_lists if comb_dates[0]==comb_dates[1]-relativedelta(months=12)]
# Get year-combination labels
year_comb_strings = [comb[1] for comb in new_year_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
for i,years in enumerate(new_year_comb_lists ): # for each year combination ...
actual_value, last_value = grp[grp['Date']==years[1]].Values.mean(), grp[grp['Date']==years[0]].Values.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, :] = [year_comb_strings[i], growth]
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
df_2= df_2.reset_index()
df_2['Annual_Reference'] = pd.to_datetime(df_2['Annual_Reference'])
Upvotes: 1
Reputation: 1614
Please find this approach.
df = df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()
def func(x):
year = 2021
for i in range(1,len(x['VALUES'])):
colname = str(year-i) + '-Growth'
x[colname] = round(abs(x['VALUES'][i]- x['VALUES'][i-1])/x['VALUES'][i]*100,2)
return x
df = df.apply(lambda x: func(x), axis=1).drop('VALUES',axis=1)
print(df)
Its a generalized code which should work for all previous years dating back to 2013 as mentioned in the comment.
Prints:
MARKET TIMEPERIOD PRODUCT 2020-Growth 2019-Growth
0 UK MARKET QUARTER GRAPE 300.00 0.0
1 UK MARKET QUARTER WATERMELON 60.00 0.0
2 UK MARKET YEARLY GRAPE 900.00 0.0
3 UK MARKET YEARLY WATERMELON 1566.67 0.0
4 USA MARKET QUARTER APPLE 66.67 0.0
5 USA MARKET QUARTER PEAR 100.00 0.0
6 USA MARKET YEARLY APPLE 50.00 0.0
7 USA MARKET YEARLY PEAR 66.67 0.0
Explanation:
First, I do a group by on the values and put it into a list:
df_2.groupby(['MARKET','TIMEPERIOD','PRODUCT'])['VALUES'].apply(list).reset_index()
for e.g
MARKET TIMEPERIOD PRODUCT VALUES
0 UK MARKET QUARTER GRAPE [200, 50, 50]
1 UK MARKET QUARTER WATERMELON [200, 500, 500]
....
Then, I write an apply to loop through the 'VALUES' list column and do the growth calculation.
Upvotes: 1
Reputation: 4929
You can use itertools.combinations
to get the year-year combination, together with further manipulation inside a function to be applied in the groups, like this:
import numpy as np
import pandas as pd
from itertools import combinations
def get_annual_growth(grp):
# Get all possible combination of the years from dataset
year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE.dt.year, 2)])
# Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
year_comb_lists = year_comb_lists[(np.diff(year_comb_lists) == 1).flatten()] # comment this line if it's not the case
# Get year-combination labels
year_comb_strings = ['-'.join(map(str, comb)) for comb in year_comb_lists]
# Create sub-dataframe with to be concated afterwards by pandas `groupby`
subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
for i,years in enumerate(year_comb_lists): # for each year combination ...
actual_value, last_value = grp[grp.DATE.dt.year==years[1]].VALUES.mean(), grp[grp.DATE.dt.year==years[0]].VALUES.mean()
growth = (actual_value - last_value) / last_value # calculate the annual growth
subdf.loc[i, :] = [year_comb_strings[i], growth]
return subdf
df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)
Output:
Annual Reference Annual Growth (%)
TIMEPERIOD MARKET PRODUCT
QUARTER UK MARKET GRAPE 0 2019-2020 300
1 2018-2019 0
WATERMELON 0 2019-2020 -60
1 2018-2019 0
USA MARKET APPLE 0 2019-2020 -66.67
1 2018-2019 0
PEAR 0 2019-2020 100
1 2018-2019 0
YEARLY UK MARKET GRAPE 0 2019-2020 900
1 2018-2019 0
WATERMELON 0 2019-2020 1566.67
1 2018-2019 0
USA MARKET APPLE 0 2019-2020 -50
1 2018-2019 0
PEAR 0 2019-2020 66.67
1 2018-2019 0
Upvotes: 2