Aidan L.
Aidan L.

Reputation: 89

Ordering Timestamps for ARIMA model predicion

Info:

I am trying to predict the price of Bitcoin,as a test and to make it easier, 1 day after my most current datetime in my data. So t = 05/27/2020, t + 1 = 05/28/2020.

So I loaded my data:

x = pd.read_csv('btcdata.csv', header=0, parse_dates=['Date'], index_col=0)
close = x.Close

And this is how it looks like .head():

Date
2020-05-27    8854.32
2020-05-26    8844.42
2020-05-25    8899.31
2020-05-24    8715.73
2020-05-23    9181.76

There is a little problem with this is and that the most recent date is located at the top, and the most oldest date is located at the bottom. Most dates are organized by the opposite, at least that's how the ARIMA model sees it.

So when I fit and predict using the model .forecast() , this was my output[0]:

[381.59648517]

Which actually matches more to the .tail() of my data:

Date
2014-12-05    377.1
2014-12-04    377.1
2014-12-03    378.0
2014-12-02    378.0
2014-12-01    370.0

Question/Problem:

How do I get around this, and order it in a way so that the ARIMA model knows which is my most recent date t and know to predict for t + 1

Also every time I fit my model, there were these two warnings. It may be relevant to the problem:

ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
ValueWarning: A date index has been provided, but it is not monotonic and so will be ignored when e.g. forecasting.

Upvotes: 1

Views: 1648

Answers (2)

Derek O
Derek O

Reputation: 19600

ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting. means that ARIMA doesn't understand the format of your data.

This should convert everything to DatetimeIndex with frequency as days.

x.index = pd.DatetimeIndex(x.index).to_period('D')

ValueWarning: A date index has been provided, but it is not monotonic and so will be ignored when e.g. forecasting. means that the data isn't sorted, so type in this line:

x = x.sort_index()

Upvotes: 3

Bill
Bill

Reputation: 11658

If the problem is simply that your data is not sorted correctly then this should work.

Data sorted by date:

prices = x.sort_index()

Or, if you only want the 5 most recent data points:

latest_prices = x.sort_index().iloc[-5:]

Upvotes: 1

Related Questions