Reputation: 23
So I have been studying the SP500 yearly returns with information downloaded from my quandl subscription. I have used resample() and pct_change() to study the data but my results are not coming as to what is expected for some reason.
sp500_df = quandl.get("MULTPL/SP500_REAL_PRICE_MONTH", authtoken="YOUR OWN AUTH KEY")
sp500_Y_ret_df = sp500_df['Value'].resample('Y').mean().pct_change().dropna()
The expected value for the SP 500 return for year ending 2008 should be -38.5% but my code is showing -17% for some reason? If for some reason you cannot access the data I can provide a .csv file for the data. Thanks a million for the help.
sp500_Y_ret_df.loc['2008-12-31']
output:
-0.17319465450687388
last 20 years:
sp500_Y_ret_df.tail(20)
output:
2001-12-31 -0.164631
2002-12-31 -0.164795
2003-12-31 -0.032081
2004-12-31 0.173145
2005-12-31 0.067678
2006-12-31 0.085836
2007-12-31 0.126625
2008-12-31 -0.173195
2009-12-31 -0.224552
2010-12-31 0.203406
2011-12-31 0.113738
2012-12-31 0.087221
2013-12-31 0.190603
2014-12-31 0.175436
2015-12-31 0.067610
2016-12-31 0.014868
2017-12-31 0.170363
2018-12-31 0.121093
2019-12-31 0.065247
2020-12-31 0.061747
Freq: A-DEC, Name: Value, dtype: float64
USING Random made data:
aapl_df = pd.DataFrame({
'ticker':np.repeat( ['aapl'], 2500 ),
'date':pd.date_range('1/1/2011', periods=2500, freq='D'),
'price':(np.random.randn(2500).cumsum() + 10) }).set_index('date')
aapl_df.head()
date
2011-01-01 aapl 9.011290
2011-01-02 aapl 9.092603
2011-01-03 aapl 9.139830
2011-01-04 aapl 7.782112
2011-01-05 aapl 8.316270
using 'last' as stated yielded closer results but not sure if that is pure luck
aapl_Y_ret_df = aapl_df['price'].resample('Y').last()
aapl_Y_ret_df.tail()
output
date
2013-12-31 18.535328
2014-12-31 15.201832
2015-12-31 36.040411
2016-12-31 42.272464
2017-12-31 20.421079
Freq: A-DEC, Name: price, dtype: float64
--
aapl_Y_ret_df = aapl_df['price'].resample('Y').last().pct_change()
aapl_Y_ret_df.tail()
date
2013-12-31 0.569359
2014-12-31 -0.179846
2015-12-31 1.370794
2016-12-31 0.172918
2017-12-31 -0.516918
Freq: A-DEC, Name: price, dtype: float64
Upvotes: 1
Views: 4401
Reputation: 62373
'Adj Close'
, which is what the OP wants.Close
or Adj Close
, and then sum
and multiply by 100.groupby
and DataFrameGroupBy.pct_change
to get the values by year.df['Adj Close'].resample('Y').mean()
returns the mean of the 'Adj Close'
values for each year, which is not how to determine the yearly return.
-17.4%
. This is not the return.python 3.11.2
, pandas 2.0.0
import yfinance as yf
import pandas as pd
# load S&P 500 data
df = yf.download('^gspc', start='2000-01-01', end='2020-01-01').reset_index()
# display(df)
Date High Low Open Close Volume Adj Close
0 2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
1 2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2 2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
3 2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
4 2000-01-07 1441.469971 1400.729980 1403.449951 1441.469971 1225200000 1441.469971
# groupby year and determine the daily percent change by year, and add it as a column to df
df['pct_ch'] = df.groupby(df.Date.dt.year)['Adj Close'].pct_change()
# groupby year and aggregate sum of pct_ch to get the yearly return
yearly_pct_ch = df.groupby(df.Date.dt.year)['pct_ch'].sum().mul(100).reset_index().rename(columns={'pct_ch': 'cum_pct_ch_year'})
# display(yearly_pct_ch)
Date cum_pct_ch_year
0 2000 -7.274088
1 2001 -8.890805
2 2002 -23.811947
3 2003 21.552072
4 2004 9.535574
5 2005 4.295586
6 2006 11.626670
7 2007 4.860178
8 2008 -38.702107
9 2009 21.622674
10 2010 12.052038
11 2011 1.575069
12 2012 11.840560
13 2013 24.012739
14 2014 12.320664
15 2015 0.501799
16 2016 11.494988
17 2017 17.127082
18 2018 -5.822426
19 2019 26.031938
Upvotes: 3
Reputation: 442
The approved answer is actually the wrong answer. When it comes to financial data, a little of mathematical knowledge is required as well.
Let's assume you have daily returns. To calculate yearly returns, you should actually compound the daily returns.
For example, let's say you invested $10 on day1. The returns for next 2 days are 10% and 10% respectively.
Now, if you simply sum up values of returns, you will get the wrong answer. 10% + 10% = 20%. And, 20% of $10 is $2. This is what the author has suggested in the above answer, which is wrong.
Instead, follow the below steps:
10%/100 = 0.1
.1 + 0.1
. You will get 1.1 and 1.1 respectively, for 2 days' returns.1.1 * 1.1 = 1.21
.1.21 - 1 = 0.21
.0.21*100 = 21%
.This is your final return after 2 days. Let's check: 21% of $10 is $2.1.
Pandas code:
# only if needed
df['daily_returns'] /= 100.0
daily_returns_compounded = df['daily_returns'] + 1
annual_return = daily_returns_compounded.prod()
annual_return -= - 1
# only if needed
annual_return *= 100
If you have multiple years of data, use groupby()
Upvotes: -1