Reputation: 2621
I've got a pandas dataframe with a datetime index, with frequency set to "C" - business custom:
ipdb> data.index
DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
'2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
'2021-03-17', '2021-03-18',
...
'2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
'2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
'2021-11-18', '2021-11-19'],
dtype='datetime64[ns]', name='mktDates', length=180, freq='C')
The index was created using pandas bdate_range
function
holidays = pd.read_csv('../data/raw/market_holidays.csv', parse_dates=True, infer_datetime_format=True)
holidays = pd.to_datetime(holidays['date_YYYY_MM_DD'], format='%Y-%m-%d')
sttDate = dat.datetime(2013, 1, 1)
stpDate = dat.datetime(2021, 12, 31)
# build the calendar
mktCalendar = pd.bdate_range(start=sttDate, end=stpDate, holidays=holidays.values, freq='C').rename('mktDates')
I'm trying to fit an ARIMA model with statsmodels using the code:
import statsmodels.api as sm
thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')
The last line throws an exception:
<ipython-input-392-acbc7f25591c> in ARIMASimulate(data, simParams, randSeed, verbose)
27 # fit and get the score
28 ipdb.set_trace()
---> 29 arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\model.py in __init__(self, endog, exog, order, seasonal_order, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
107 >>> print(res.summary())
108 """
--> 109 def __init__(self, endog, exog=None, order=(0, 0, 0),
110 seasonal_order=(0, 0, 0, 0), trend=None,
111 enforce_stationarity=True, enforce_invertibility=True,
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\specification.py in __init__(self, endog, exog, order, seasonal_order, ar_order, diff, ma_order, seasonal_ar_order, seasonal_diff, seasonal_ma_order, seasonal_periods, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
444 # especially validating shapes, retrieving names, and potentially
445 # providing us with a time series index
--> 446 self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=freq,
447 missing=missing)
448 self.endog = None if faux_endog else self._model.endog
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in __init__(self, endog, exog, dates, freq, missing, **kwargs)
413
414 # Date handling in indexes
--> 415 self._init_dates(dates, freq)
416
417 def _init_dates(self, dates=None, freq=None):
~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in _init_dates(self, dates, freq)
555 elif (freq is not None and not inferred_freq and
556 not (index.freq == freq)):
--> 557 raise ValueError('The given frequency argument is'
558 ' incompatible with the given index.')
559 # Finally, raise an exception if we could not coerce to date-based
ValueError: The given frequency argument is incompatible with the given index
I don't understand this, as the frequency argument is the same as that of the data index. I also know that the index is not missing any dates as per the frequency. I've got statsmodels 0.12.1. Any idea what is going on here?
Upvotes: 1
Views: 2492
Reputation: 2269
Trying to generate a DateTimeIndex with freq='C'
from 2021-03-05 to 2021-11-19, the length is 186
. Your index is 180
so 6 dates are missing
import pandas as pd
date_range = pd.date_range(
start='2021-03-05',
end='2021-11-19',
freq='C'
)
print(date_range)
DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
'2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
'2021-03-17', '2021-03-18',
...
'2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
'2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
'2021-11-18', '2021-11-19'],
dtype='datetime64[ns]', length=186, freq='C')
Using this date_range
with ARIMA, gives no error
import numpy as np
import statsmodels.api as sm
x = np.linspace(0, 2*np.pi, date_range.size)
y = np.sin(4*np.pi*x)
data = pd.DataFrame({
'Y': y,
}, index=date_range)
thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(
endog=data, order=thisOrder,
freq='C'
)
So you may need to check your DataFrame index.
Upvotes: 1