Brian
Brian

Reputation: 117

Differencing and Autocorrelation Function plots x-axis extends far beyond dataset range

I have a series of Sales data with associated Date of the record set as the index. with the data ranging from 2013 to 2015. I want to run an ARIMA on this time-series.

display(store1_train['Sales'])

Output:

Date
2013-01-01       0
2013-01-02    5530
2013-01-03    4327
2013-01-04    4486
2013-01-05    4997
              ... 
2015-06-26    3317
2015-06-27    4019
2015-06-28       0
2015-06-29    5197
2015-06-30    5735

However when I run the following code to plot differencing and autocorrelation functions, the x-axis displays from 1970 till 2015. Why does this happen, and how can I control the x-axis range? I've tried specifying the lags parameter for plot_acf but nothing happens. I'm quite a novice and found this code from a tutorial and adapted it to my needs.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plt.rcParams.update({'figure.figsize':(7,7), 'figure.dpi':120})

# Original Series
fig, axes = plt.subplots(3, 2, sharex=True)
axes[0, 0].plot(store1_train.Sales); axes[0, 0].set_title('Original Series')
plot_acf(store1_train.Sales, ax=axes[0, 1])

# 1st Differencing
axes[1, 0].plot(store1_train.Sales.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(store1_train.Sales.diff().dropna(), ax=axes[1, 1])

# 2nd Differencing
axes[2, 0].plot(store1_train.Sales.diff().diff()); axes[2, 0].set_title('2nd Order Differencing')
plot_acf(store1_train.Sales.diff().diff().dropna(), ax=axes[2, 1])

plt.show()

Plot output:

enter image description here

Upvotes: 0

Views: 526

Answers (1)

Arne Decker
Arne Decker

Reputation: 923

The issue is in this line here: fig, axes = plt.subplots(3, 2, sharex=True)

If you set sharex=True then all subplots will have a common x axis. The original series is supposed to be plotted with dates on the x axis and the autocorrelation with numeric values -> lags ranging from 1 to whatever

If they are plotted on a numeric axis the dates of the original series are converted to numbers, the numeration starts 1970-01-01 with 0, so 2013-01-01 will have a very high value somewhere in the 20 thousands.

Set sharex=False, then it should look normal again.

Upvotes: 2

Related Questions