Dr Dro
Dr Dro

Reputation: 147

Predict time series with Statsmodels VAR and encountering ValueError

I'm trying to forecast future values from my monthly dataset (the data is summarized as first day of a month, 12 times a year) and I'm encountering:

ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.

I've tried to run around Google and StackO but failed to get a relevant thread and a good enough solution.

This is head(13) of my dataframe:

            Occupancy rate  Average Price     RevPAR
Date                                                
2013-01-01        0.579026     105.289497  60.965332
2013-02-01        0.637415     109.396682  69.731070
2013-03-01        0.714847     117.840534  84.237901
2013-04-01        0.716446     122.765139  87.954593
2013-05-01        0.771097     105.461387  81.320985
2013-06-01        0.768777     115.252163  88.603262
2013-07-01        0.677020      81.824781  55.396987
2013-08-01        0.673639      72.489988  48.832110
2013-09-01        0.783291     125.034417  97.938296
2013-10-01        0.779694     118.724648  92.568902
2013-11-01        0.771430     113.322446  87.420366
2013-12-01        0.680166     100.950857  68.663388
2014-01-01        0.573320     102.881633  58.984090

And this is the very basic fitting I'm trying to run for the very beginning.

model = VAR(df)
results = model.fit(2)
results.forecast(df.values[-2:], 5)
results.summary()

I'm assuming I need to set some kind of a frequency attribute to the dataframe. I've tried doing a brute df.asfreq('M') but it simply messes up my data.

Upvotes: 4

Views: 3098

Answers (1)

Woods Chen
Woods Chen

Reputation: 620

I don't know the model you are using, however most likely it's either caused by the missing values in the time series or by the non matched freq (freq for the month beginning is MS).

So as I think, you can create a new time series with pd.date_range, then reindex the dataframe with the created time series.

if the input dataframe is:

In [10]: df
Out[10]:
            0  1
2018-01-01  2  1
2018-03-01  0  0

we can then create a new time series:

In [12]: index = pd.date_range(start=df.index.min(), end=df.index.max(), freq='MS')

In [13]: index
Out[13]: DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01'], dtype='datetime64[ns]', freq='MS')

then reindex the dataframe

In [14]: df.reindex(index)
Out[14]:
            0    1
2018-01-01  2.0  1.0
2018-02-01  NaN  NaN
2018-03-01  0.0  0.0

and additionally we can fill the Nan values in the dataframe with some appropriate values to meet the model training.

Upvotes: 4

Related Questions