Cheng
Cheng

Reputation: 17944

Question about autocorrelation_plot result vs autocorr result

I used autocorrelation_plot to plot the autocorrelation of a straight line:

import numpy as np
import pandas as pd
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt

dr = pd.date_range(start='1984-01-01', end='1984-12-31')

df = pd.DataFrame(np.arange(len(dr)), index=dr, columns=["Values"])
autocorrelation_plot(df)
plt.show()

enter image description here

Then, I tried using autocorr() to calculate the autocorrelation with different lags:

for i in range(0,366):
    print(df['Values'].autocorr(lag=i))

The output is 1 (or 0.99) for all the lag. But it is clear from the correlogram that the autocorrelation is a curve rather than a straight line fixed at 1.

Did I interpret the correlogram incorrectly or did I use the autocorr() function incorrectly?

Upvotes: 3

Views: 1693

Answers (1)

Sander van den Oord
Sander van den Oord

Reputation: 12818

You are using both functions correctly, but... Autocorrelation_plot uses a different way of calculating autocorrelations then autocorr() does.

The following two posts explain more about the differences. Unfortunately I don't know which way of calculating is the correct way:

What's the difference between pandas ACF and statsmodel ACF?

Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

If you need it, you can get the autocorrelations out of your autocorrelation plot as follows:

ax = autocorrelation_plot(df)
ax.lines[5].get_data()[1]

Upvotes: 3

Related Questions