Dr. Andrew
Dr. Andrew

Reputation: 2621

Statsmodels ARIMA date index frequency

I've got a pandas dataframe with a datetime index, with frequency set to "C" - business custom:

ipdb>  data.index
DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
               '2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
               '2021-03-17', '2021-03-18',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
               '2021-11-18', '2021-11-19'],
              dtype='datetime64[ns]', name='mktDates', length=180, freq='C')

The index was created using pandas bdate_range function

holidays = pd.read_csv('../data/raw/market_holidays.csv', parse_dates=True, infer_datetime_format=True)
holidays = pd.to_datetime(holidays['date_YYYY_MM_DD'], format='%Y-%m-%d')

sttDate = dat.datetime(2013, 1, 1)
stpDate = dat.datetime(2021, 12, 31)

# build the calendar
mktCalendar = pd.bdate_range(start=sttDate, end=stpDate, holidays=holidays.values, freq='C').rename('mktDates')

I'm trying to fit an ARIMA model with statsmodels using the code:

import statsmodels.api as sm
thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')

The last line throws an exception:

<ipython-input-392-acbc7f25591c> in ARIMASimulate(data, simParams, randSeed, verbose)
     27         # fit and get the score
     28         ipdb.set_trace()
---> 29         arima = sm.tsa.arima.ARIMA(endog=data, order=thisOrder, freq='C')

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\model.py in __init__(self, endog, exog, order, seasonal_order, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
    107     >>> print(res.summary())
    108     """
--> 109     def __init__(self, endog, exog=None, order=(0, 0, 0),
    110                  seasonal_order=(0, 0, 0, 0), trend=None,
    111                  enforce_stationarity=True, enforce_invertibility=True,

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\arima\specification.py in __init__(self, endog, exog, order, seasonal_order, ar_order, diff, ma_order, seasonal_ar_order, seasonal_diff, seasonal_ma_order, seasonal_periods, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification)
    444         # especially validating shapes, retrieving names, and potentially
    445         # providing us with a time series index
--> 446         self._model = TimeSeriesModel(endog, exog=exog, dates=dates, freq=freq,
    447                                       missing=missing)
    448         self.endog = None if faux_endog else self._model.endog

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in __init__(self, endog, exog, dates, freq, missing, **kwargs)
    413 
    414         # Date handling in indexes
--> 415         self._init_dates(dates, freq)
    416 
    417     def _init_dates(self, dates=None, freq=None):

~\Anaconda3\envs\pybakken\lib\site-packages\statsmodels\tsa\base\tsa_model.py in _init_dates(self, dates, freq)
    555                 elif (freq is not None and not inferred_freq and
    556                         not (index.freq == freq)):
--> 557                     raise ValueError('The given frequency argument is'
    558                                      ' incompatible with the given index.')
    559             # Finally, raise an exception if we could not coerce to date-based

ValueError: The given frequency argument is incompatible with the given index

I don't understand this, as the frequency argument is the same as that of the data index. I also know that the index is not missing any dates as per the frequency. I've got statsmodels 0.12.1. Any idea what is going on here?

Upvotes: 1

Views: 2492

Answers (1)

Max Pierini
Max Pierini

Reputation: 2269

Trying to generate a DateTimeIndex with freq='C' from 2021-03-05 to 2021-11-19, the length is 186. Your index is 180 so 6 dates are missing

import pandas as pd

date_range = pd.date_range(
    start='2021-03-05',
    end='2021-11-19',
    freq='C'
)

print(date_range)

DatetimeIndex(['2021-03-05', '2021-03-08', '2021-03-09', '2021-03-10',
               '2021-03-11', '2021-03-12', '2021-03-15', '2021-03-16',
               '2021-03-17', '2021-03-18',
               ...
               '2021-11-08', '2021-11-09', '2021-11-10', '2021-11-11',
               '2021-11-12', '2021-11-15', '2021-11-16', '2021-11-17',
               '2021-11-18', '2021-11-19'],
              dtype='datetime64[ns]', length=186, freq='C')

Using this date_range with ARIMA, gives no error

import numpy as np
import statsmodels.api as sm

x = np.linspace(0, 2*np.pi, date_range.size)
y = np.sin(4*np.pi*x)

data = pd.DataFrame({
    'Y': y,
}, index=date_range)

thisOrder = (1, 1, 1)
arima = sm.tsa.arima.ARIMA(
    endog=data, order=thisOrder, 
    freq='C'
)

So you may need to check your DataFrame index.

Upvotes: 1

Related Questions