nlieb7
nlieb7

Reputation: 91

Python: Calculate 5-year rolling CAGR of values that need to be grouped from a dataframe

I have a dataframe with historical market caps for which I need to compute their 5-year compound annual growth rates (CAGRs). However, the dataframe has hundreds of companies with 20 years of values each, so I need to be able to isolate each company's data to compute their CAGRs. How do I go about doing this?

The function to calculate a CAGR is: (end/start)^(1/# years)-1. I have never used .groupby() or .apply(), so I don't know how to implement the CAGR equation for rolling values.

Here is a screenshot of part of the dataframe so you have a visual representation of what I am trying to use: Screeshot of dataframe.

Any guidance would be greatly appreciated!

Upvotes: 1

Views: 5657

Answers (3)

Arseni
Arseni

Reputation: 33

I'm only four years late here and can't comment yet, so adding this as an answer.

It's probably the newer Python & Pandas causing this but this now fires Assertion Error, as the .groupby adds the grouping key to the output.

df["cagr"] = df.groupby("company").apply(lambda x, period: ((x.pct_change(period) + 1) ** (1/period)) - 1, cagr_period)

Simple fix to this is to use group_keys=False

df["cagr"] = df.groupby("company", group_keys=False).apply(lambda x, period: ((x.pct_change(period) + 1) ** (1/period)) - 1, cagr_period)

Upvotes: 0

Andi
Andi

Reputation: 4899

Setting up a toy example:

import numpy as np
import pandas as pd

idx_level_0 = np.repeat(["company1", "company2", "company3"], 5)
idx_level_1 = np.tile([2015, 2016, 2017, 2018, 2019], 3)
values = np.random.randint(low=1, high=100, size=15)

df = pd.DataFrame({"values": values}, index=[idx_level_0, idx_level_1])
df.index.names = ["company", "year"]
print(df)

               values
company  year        
company1 2015      19
         2016      61
         2017      87
         2018      55
         2019      46
company2 2015       1
         2016      68
         2017      50
         2018      93
         2019      84
company3 2015      11
         2016      84
         2017      54
         2018      21
         2019      55

I suggest to use groupby to group by individual companies. You then could apply your computation via a lambda function. The result is basically a one-liner.

# actual computation for a two-year period
cagr_period = 2
df["cagr"] = df.groupby("company").apply(lambda x, period: ((x.pct_change(period) + 1) ** (1/period)) - 1, cagr_period)
print(df)


               values      cagr
company  year                  
company1 2015      19       NaN
         2016      61       NaN
         2017      87  1.139848
         2018      55 -0.050453
         2019      46 -0.272858
company2 2015       1       NaN
         2016      68       NaN
         2017      50  6.071068
         2018      93  0.169464
         2019      84  0.296148
company3 2015      11       NaN
         2016      84       NaN
         2017      54  1.215647
         2018      21 -0.500000
         2019      55  0.009217

Upvotes: 2

Tim
Tim

Reputation: 3427

Assuming there is 1 value per company per year. You can reduce the date to year. This is a lot simpler. No need for groupby or apply.

Say your dataframe is name df. First, reduce date to year:

df['year'] = df['Date'].dt.year

Second, add year+5

df['year+5'] = df['year'] + 5

Third, merge the 'df' with itself:

df_new = pandas.merge(df, df, how='inner', left_on=['Instrument', 'year'], right_on=['Instrument','year+5'], suffixes=['_start', '_end'])

Finally, calculate rolling CAGR

df_new['CAGR'] = (df_new['Company Market Cap_end']/df_new['Company Market Cap_start'])**(0.2)-1

Upvotes: 2

Related Questions