Ultima
Ultima

Reputation: 23

How to calculate the yearly cumulative percent change

So I have been studying the SP500 yearly returns with information downloaded from my quandl subscription. I have used resample() and pct_change() to study the data but my results are not coming as to what is expected for some reason.

sp500_df = quandl.get("MULTPL/SP500_REAL_PRICE_MONTH", authtoken="YOUR OWN AUTH KEY")
sp500_Y_ret_df = sp500_df['Value'].resample('Y').mean().pct_change().dropna()

The expected value for the SP 500 return for year ending 2008 should be -38.5% but my code is showing -17% for some reason? If for some reason you cannot access the data I can provide a .csv file for the data. Thanks a million for the help.

sp500_Y_ret_df.loc['2008-12-31']

output:

-0.17319465450687388

last 20 years:

sp500_Y_ret_df.tail(20)

output:

2001-12-31   -0.164631
2002-12-31   -0.164795
2003-12-31   -0.032081
2004-12-31    0.173145
2005-12-31    0.067678
2006-12-31    0.085836
2007-12-31    0.126625
2008-12-31   -0.173195
2009-12-31   -0.224552
2010-12-31    0.203406
2011-12-31    0.113738
2012-12-31    0.087221
2013-12-31    0.190603
2014-12-31    0.175436
2015-12-31    0.067610
2016-12-31    0.014868
2017-12-31    0.170363
2018-12-31    0.121093
2019-12-31    0.065247
2020-12-31    0.061747
Freq: A-DEC, Name: Value, dtype: float64

USING Random made data:

aapl_df = pd.DataFrame({ 
'ticker':np.repeat( ['aapl'], 2500 ),
'date':pd.date_range('1/1/2011', periods=2500, freq='D'),
'price':(np.random.randn(2500).cumsum() + 10) }).set_index('date')
aapl_df.head()

date        
2011-01-01  aapl    9.011290
2011-01-02  aapl    9.092603
2011-01-03  aapl    9.139830
2011-01-04  aapl    7.782112
2011-01-05  aapl    8.316270

using 'last' as stated yielded closer results but not sure if that is pure luck

aapl_Y_ret_df = aapl_df['price'].resample('Y').last()
aapl_Y_ret_df.tail()

output

    date
2013-12-31    18.535328
2014-12-31    15.201832
2015-12-31    36.040411
2016-12-31    42.272464
2017-12-31    20.421079
Freq: A-DEC, Name: price, dtype: float64

--

aapl_Y_ret_df = aapl_df['price'].resample('Y').last().pct_change()
aapl_Y_ret_df.tail()
date
2013-12-31    0.569359
2014-12-31   -0.179846
2015-12-31    1.370794
2016-12-31    0.172918
2017-12-31   -0.516918
Freq: A-DEC, Name: price, dtype: float64

Upvotes: 1

Views: 4401

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62373

  • This answers the question of how to calculate the yearly cumulative percent change of the 'Adj Close', which is what the OP wants.
  • Calculate the yearly return by finding the daily percent change in the Close or Adj Close, and then sum and multiply by 100.
  • Use groupby and DataFrameGroupBy.pct_change to get the values by year.
  • df['Adj Close'].resample('Y').mean() returns the mean of the 'Adj Close' values for each year, which is not how to determine the yearly return.
    • The percent change of the mean close from 2007 to 2008 is -17.4%. This is not the return.
  • Tested in python 3.11.2, pandas 2.0.0
import yfinance as yf
import pandas as pd

# load S&P 500 data
df = yf.download('^gspc', start='2000-01-01', end='2020-01-01').reset_index()

# display(df)
        Date         High          Low         Open        Close      Volume    Adj Close
0 2000-01-03  1478.000000  1438.359985  1469.250000  1455.219971   931800000  1455.219971
1 2000-01-04  1455.219971  1397.430054  1455.219971  1399.420044  1009000000  1399.420044
2 2000-01-05  1413.270020  1377.680054  1399.420044  1402.109985  1085500000  1402.109985
3 2000-01-06  1411.900024  1392.099976  1402.109985  1403.449951  1092300000  1403.449951
4 2000-01-07  1441.469971  1400.729980  1403.449951  1441.469971  1225200000  1441.469971

# groupby year and determine the daily percent change by year, and add it as a column to df
df['pct_ch'] = df.groupby(df.Date.dt.year)['Adj Close'].pct_change()

# groupby year and aggregate sum of pct_ch to get the yearly return
yearly_pct_ch = df.groupby(df.Date.dt.year)['pct_ch'].sum().mul(100).reset_index().rename(columns={'pct_ch': 'cum_pct_ch_year'})

# display(yearly_pct_ch)
    Date  cum_pct_ch_year
0   2000        -7.274088
1   2001        -8.890805
2   2002       -23.811947
3   2003        21.552072
4   2004         9.535574
5   2005         4.295586
6   2006        11.626670
7   2007         4.860178
8   2008       -38.702107
9   2009        21.622674
10  2010        12.052038
11  2011         1.575069
12  2012        11.840560
13  2013        24.012739
14  2014        12.320664
15  2015         0.501799
16  2016        11.494988
17  2017        17.127082
18  2018        -5.822426
19  2019        26.031938

Upvotes: 3

Kadam Parikh
Kadam Parikh

Reputation: 442

The approved answer is actually the wrong answer. When it comes to financial data, a little of mathematical knowledge is required as well.

Let's assume you have daily returns. To calculate yearly returns, you should actually compound the daily returns.

For example, let's say you invested $10 on day1. The returns for next 2 days are 10% and 10% respectively.

  1. Your final value on the end of day1 will be 10% of $10 i.e. $11.
  2. Your final value on the end of day2 will be 10% of $11 i.e $12.1.
  3. Returns is $2.1.

Now, if you simply sum up values of returns, you will get the wrong answer. 10% + 10% = 20%. And, 20% of $10 is $2. This is what the author has suggested in the above answer, which is wrong.

Instead, follow the below steps:

  1. Standardize daily returns by dividing them by 100, i.e. 10%/100 = 0.1.
  2. Add 1 to the standardized daily returns, i.e. 1 + 0.1. You will get 1.1 and 1.1 respectively, for 2 days' returns.
  3. Multiply all the daily returns i.e. 1.1 * 1.1 = 1.21.
  4. Subtract 1 from the product i.e. 1.21 - 1 = 0.21.
  5. Multiply the value by 100 i.e. 0.21*100 = 21%.

This is your final return after 2 days. Let's check: 21% of $10 is $2.1.

Pandas code:

# only if needed
df['daily_returns'] /= 100.0

daily_returns_compounded = df['daily_returns'] + 1

annual_return = daily_returns_compounded.prod()

annual_return -= - 1

# only if needed
annual_return *= 100 

If you have multiple years of data, use groupby()

Upvotes: -1

Related Questions