Reputation: 91
I am using the yfinance library to import data for a given stock. See code below:
import yfinance as yf
from datetime import datetime as dt
import pandas as pd
# Naming Constants
stock = "AAPL"
start_date = "2014-01-01"
end_date = "2018-01-01"
# Importing all the data into a dataFrame
stock_data = yf.download(stock, start=start_date, end=end_date)
When I call print(stock_data.index)
I have the following:
DatetimeIndex(['2014-01-02', '2014-01-03', '2014-01-06', '2014-01-07', '2014-01-08', '2014-01-09', '2014-01-10', '2014-01-13', '2014-01-14', '2014-01-15',
...
'2017-12-15', '2017-12-18', '2017-12-19', '2017-12-20', '2017-12-21', '2017-12-22', '2017-12-26', '2017-12-27', '2017-12-28', '2017-12-29'],
dtype='datetime64[ns]', name='Date', length=1007, freq=None)
I wish to switch the frequency argument from None to daily since every Date refers to a trading day.
When I say stock_data.index.freq = 'B'
I get the following error:
ValueError: Inferred frequency None from passed values does not conform to passed frequency B
And if I put stock_data = stock_data.asfreq('B'), it will change the frequency but it will add certain lines that were not there originally and fills them with NA values.
In other words, what is the offset ALIAS used for trading days?
You can find the list of alias from the Pandas documentation here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
Upvotes: 0
Views: 3087
Reputation: 12345
The error with stock_data.index.freq = 'B'
indicates that your timeseries frequency is not 'business-day', but undefined or 'None'.
With
stock_data = stock_data.asfreq('B')
your are re-indexing your timeseries to business-daily frequency: The missing timestamps will be added, and the missing stock data values are set to NaN. Now you need to decide how replace them, so have a look here: pandas.DataFrame.asfreq. So you could replace all NaN's with a fixed value like -999, but in general what you want to do with stock data is take the last valid value at a given point in time, which is forward filling the gaps:
stock_data = stock_data.asfreq('B', method='ffill')
It's always worth reading the docs.
Upvotes: 3