Shravan
Shravan

Reputation: 2723

What is the difference between auto-correlation in matplotlib and auto-correlation in pandas.tools.plotting?

How is the computation of auto correlation in matplotlib different from other libraries like pandas.tools.plotting, sm.graphics.tsa.plot_acf etc.?

From the code below we can notice that auto correlation values returned by these two libraries differs, like matplotlib return all auto correlation values greater than zero and pandas.tools.plotting returns some -ve auto correlation values (Apart from confidence interval, negative x-axis).

import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
from pandas.tools.plotting import autocorrelation_plot

dta = sm.datasets.sunspots.load_pandas().data
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]

plt.acorr(dta['SUNACTIVITY'],maxlags = len(dta['SUNACTIVITY']) -1, linestyle = "solid", usevlines = False, marker='')
plt.show()

autocorrelation_plot(dta['SUNACTIVITY'])
plt.show()

Upvotes: 4

Views: 1975

Answers (1)

Shravan
Shravan

Reputation: 2723

Auto-correlation in pandas plotting and statsmodel graphics standardize the data before computing the auto-correlation. These libraries subtract the mean and divide by the standard deviation of the data.

When using standardization, they make an assumption that your data has been generated with a Gaussian law (with a certain mean and standard deviation). This may not be the case in reality.

Correlation is sensitive. Both (matplotlib and pandas plotting) of these functions have their drawbacks.

Figure generated by the following code using matplotlib will be identical to figure generated by pandas plotting or statsmodels graphics

dta['SUNACTIVITY_2'] = dta['SUNACTIVITY']
dta['SUNACTIVITY_2'] = (dta['SUNACTIVITY_2'] - dta['SUNACTIVITY_2'].mean()) /     (dta['SUNACTIVITY_2'].std())
plt.acorr(dta['SUNACTIVITY_2'],maxlags = len(dta['SUNACTIVITY_2']) -1, linestyle = "solid", usevlines = False, marker='')
plt.show()

Source code:

Matplotlib

Pandas

Upvotes: 2

Related Questions