Reputation: 1190
Not able to get the same results with resampled data. For example:
import yfinance as yf
import pandas as pd
df = yf.download('f', interval = '1mo')
df = df[~df.index.duplicated(keep='last')]
# df = df.resample('CM').mean()
df['pct'] = df['Close'].pct_change()
print(df)
Produces this DataFrame
Open High Low Close Adj Close Volume pct
Date
1972-10-25 NaN NaN NaN NaN NaN NaN NaN
1973-01-23 NaN NaN NaN NaN NaN NaN NaN
1973-07-24 NaN NaN NaN NaN NaN NaN NaN
1973-10-23 NaN NaN NaN NaN NaN NaN NaN
1975-04-24 NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
2022-03-01 17.41 18.250 15.51 16.91 16.798235 1.749488e+09 -0.037016
2022-04-01 17.01 17.150 13.90 14.16 14.066411 1.341340e+09 -0.162626
2022-05-01 14.02 15.000 12.07 13.68 13.680000 1.359644e+09 -0.033898
2022-06-01 13.88 13.970 13.36 13.50 13.500000 1.372803e+08 -0.013158
2022-06-03 13.63 13.775 13.36 13.50 13.500000 4.362525e+07 0.000000
I have resampled a daily data with 'MS' calendar month begin
. This is so the resample data matches the default data we get using interval = '1mo'
df = yf.download('f')
df = df[~df.index.duplicated(keep='last')]
df = df.resample('MS').mean()
df['pct'] = df['Close'].pct_change()
print(df)
And this produces
Date ...
1972-06-01 2.015902 2.127047 2.098477 ... 0.267396 1.545780e+06 NaN
1972-07-01 2.053871 2.071511 2.039882 ... 0.260849 1.061645e+06 -0.027089
1972-08-01 2.155512 2.171732 2.140879 ... 0.276273 1.683096e+06 0.050896
1972-09-01 2.124429 2.136392 2.110845 ... 0.271796 9.511339e+05 -0.016205
1972-10-01 2.118384 2.128706 2.106587 ... 0.271777 1.152049e+06 -0.002440
... ... ... ... ... ... ...
2022-02-01 18.114211 18.490000 17.770526 ... 18.019060 9.611196e+07 -0.194397
2022-03-01 16.770435 17.036521 16.379565 ... 16.583577 7.606470e+07 -0.079665
2022-04-01 15.530500 15.796500 15.176500 ... 15.341886 6.706698e+07 -0.076340
2022-05-01 13.387619 13.679048 13.084762 ... 13.387619 6.474497e+07 -0.131773
2022-06-01 13.716667 13.903333 13.453333 ... 13.646667 4.576010e+07 0.019350
[601 rows x 7 columns]
When comparing same date in these two DataFrames one can see data is not matching. I am especially interested with the last column which is pct_change()
as they are way off.
2022-05-01 14.02 15.000 12.07 13.68 13.680000 1.359644e+09 -0.03389
By specifying interval = '1mo'
2022-05-01 13.387619 13.679048 13.084762 ... 13.387619 6.474497e+07 -0.131773
By way of resampling data from daily to Monthly.
I was earlier under the impression that it was due to the fact that the resampled data is sampled at the end of the month compared to Yahoo data which starts at the beginning of the month. However even with resampling at start of calendar month, data is not matching. Could you please advise why is the resampled data not matching the monthly data?
Upvotes: 0
Views: 311
Reputation: 545
You are taking different sampling, in the first:
df = yf.download('f', interval = '1mo')
You are taking one month candle, the open price is for the first day 01/07/2022 while the close price is the last day from that month, for instance 07/06/2022, today.
df = yf.download('f')
Although in the pypi page says the default interval is "1mo", I have tested it and appears to be "1d" (business days).
So, the mean is calculated for all the days in the corresponding month. This line is averaging the close price for all the days of the month. Which usually is undesired.
df2.resample('MS').mean()
df2 = yf.download('f')
df2 = df2[~df2.index.duplicated(keep='last')]
# df2 = df2.resample('MS').first()
# df2['pct'] = df2['Close'].pct_change()
df2.tail(10)
Here I use ...agg.first()
to track what is the first value of the aggregation data.
Then print more rows for the df2 (with default interval):
Notice it match with the lowest low value of the months' days and take the last close price (yesterday).
Be aware of defining the interval and take in mind than the mean of the month days doesn't relate with the candle prices for a month. You should use min(low)
and last(close)
instead of mean(low)
and mean(close)
.
df2.resample('MS').agg({'Open':'first','High':'max','Low': 'min','Close':'last'})
Upvotes: 1