I want to create a graph that will display the cumulative average revenue for each 'Year Onboarded' (first customer transaction) over a period of time. But I am making mistakes when grouping the information I need. Toy Data: dataset = {'ClientId': [1,2,3,1,2,3,1,2,3,1,2,3,4,4,4,4,4,4,4], 'Year Onboarded': [2018,2019,2020,2018,2019,2020,2018,2019,2020,2018,2019,2020,2016,2016,2016,2016,2016,2016,2016], 'Year': [2019,2019,2020,2019,2019,2020,2018,2020,2020,2020,2019,2020,2016,2017,2018,2019,2020,2017,2018], 'Revenue': [100,50,25,30,40,50,60,100,20,40,100,20,5,5,8,4,10,20,8]} df = pd.DataFrame(data=dataset) Explanation: Customers have a designated 'Year Onboarded' and they make a transaction every 'Year' mentioned. Then I calculate the years that have elapsed since the clients onboarded in order to make my graph visually more appealing. df['Yearsdiff'] = df['Year']-df['Year Onboarded'] To calculate the Cumulative Average Revenue I tried the following methods: First try: df = df.join(df.groupby(['Year']).expanding().agg({ 'Revenue': 'mean'}) .reset_index(level=0, drop=True) .add_suffix('_roll')) df.groupby(['Year Onboarded', 'Year']).last().drop(columns=['Revenue']) The output starts to be cumulative but the last row isn't cumulative anymore (not sure why). Second Try: df.groupby(['Year Onboarded','Year']).agg('mean') \ .groupby(level=[1]) \ .agg({'Revenue':np.cumsum}) But it doesn't work properly, I tried other ways as well but didn't achieve good results. To visualize the cumulative average revenue I simply use sns.lineplot My goal is to get a graph similar as the one below but for that I first need to group my data correctly. Expected output plot The Years that we can see on the graph represent the 'Year Onboarded' not the 'Year'. Can someone help me calculate a Cumulative Average Revenue that works in order to plot a graph similar to the one above? Thank you Also the data provided in the toy dataset will surely not give something similar to the example plot but the idea should be there.

Reputation: 145

How to calculate Cumulative Average Revenue ? Python

I want to create a graph that will display the cumulative average revenue for each 'Year Onboarded' (first customer transaction) over a period of time. But I am making mistakes when grouping the information I need.

Toy Data:

dataset = {'ClientId': [1,2,3,1,2,3,1,2,3,1,2,3,4,4,4,4,4,4,4],
           'Year Onboarded': [2018,2019,2020,2018,2019,2020,2018,2019,2020,2018,2019,2020,2016,2016,2016,2016,2016,2016,2016],
           'Year': [2019,2019,2020,2019,2019,2020,2018,2020,2020,2020,2019,2020,2016,2017,2018,2019,2020,2017,2018],
           'Revenue': [100,50,25,30,40,50,60,100,20,40,100,20,5,5,8,4,10,20,8]}
df = pd.DataFrame(data=dataset)

Explanation: Customers have a designated 'Year Onboarded' and they make a transaction every 'Year' mentioned. Then I calculate the years that have elapsed since the clients onboarded in order to make my graph visually more appealing.

df['Yearsdiff'] = df['Year']-df['Year Onboarded']

To calculate the Cumulative Average Revenue I tried the following methods:

First try:

df = df.join(df.groupby(['Year']).expanding().agg({ 'Revenue': 'mean'})
               .reset_index(level=0, drop=True)
               .add_suffix('_roll'))

df.groupby(['Year Onboarded', 'Year']).last().drop(columns=['Revenue'])

The output starts to be cumulative but the last row isn't cumulative anymore (not sure why).

Second Try:

df.groupby(['Year Onboarded','Year']).agg('mean') \
                                     .groupby(level=[1]) \
                                     .agg({'Revenue':np.cumsum})

But it doesn't work properly, I tried other ways as well but didn't achieve good results.

To visualize the cumulative average revenue I simply use sns.lineplot

My goal is to get a graph similar as the one below but for that I first need to group my data correctly.

Expected output plot

The Years that we can see on the graph represent the 'Year Onboarded' not the 'Year'.

Can someone help me calculate a Cumulative Average Revenue that works in order to plot a graph similar to the one above? Thank you

Also the data provided in the toy dataset will surely not give something similar to the example plot but the idea should be there.

Upvotes: 3

Answers (2)

zipa

Reputation: 27869

You can create rolling mean like this:

df['rolling_mean'] = df.groupby(['Year Onboarded'])['Revenue'].apply(lambda x: x.rolling(10, 1).mean())

df

#     ClientId  Year Onboarded  Year  Revenue rolling_mean
# 0          1            2018  2019      100   100.000000
# 1          2            2019  2019       50    50.000000
# 2          3            2020  2020       25    25.000000
# 3          1            2018  2019       30    65.000000
# 4          2            2019  2019       40    45.000000
# 5          3            2020  2020       50    37.500000
# 6          1            2018  2018       60    63.333333
# 7          2            2019  2020      100    63.333333
# 8          3            2020  2020       20    31.666667
# 9          1            2018  2020       40    57.500000
# 10         2            2019  2019      100    72.500000
# 11         3            2020  2020       20    28.750000
# 12         4            2016  2016        5     5.000000
# 13         4            2016  2017        5     5.000000
# 14         4            2016  2018        8     6.000000
# 15         4            2016  2019        4     5.500000
# 16         4            2016  2020       10     6.400000
# 17         4            2016  2017       20     8.666667
# 18         4            2016  2018        8     8.571429

Upvotes: 0

Celius Stingher

Reputation: 18367

This is how I would do it and considering the toy data is not the same, probably some changes should be done, but all in all:

import seaborn as sns
df1 = df.copy() 
df1['Yearsdiff'] = df1['Year']-df1['Year Onboarded']
df1['Revenue'] = df.groupby(['Year Onboarded'])['Revenue'].transform('mean') 
#Find the average revenue per Year Onboarded
df1['Revenue'] = df1.groupby(['Yearsdiff'])['Revenue'].transform('cumsum')
#Calculate the cumulative sum of Revenue (Which is now the average per Year Onboarded) per Yearsdiff (because this will be our X-axis in the plot)   
sns.lineplot(x=df1['Yearsdiff'],y=df1['Revenue'],hue=df1['Year'])
#Finally plot the data, using the column 'Year' as hue to account for the different years.

Upvotes: 0

How to calculate Cumulative Average Revenue ? Python

Answers (2)

Related Questions