Reputation: 2723
How is the computation of auto correlation in matplotlib different from other libraries like pandas.tools.plotting, sm.graphics.tsa.plot_acf etc.?
From the code below we can notice that auto correlation values returned by these two libraries differs, like matplotlib return all auto correlation values greater than zero and pandas.tools.plotting returns some -ve auto correlation values (Apart from confidence interval, negative x-axis).
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
from pandas.tools.plotting import autocorrelation_plot
dta = sm.datasets.sunspots.load_pandas().data
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]
plt.acorr(dta['SUNACTIVITY'],maxlags = len(dta['SUNACTIVITY']) -1, linestyle = "solid", usevlines = False, marker='')
plt.show()
autocorrelation_plot(dta['SUNACTIVITY'])
plt.show()
Upvotes: 4
Views: 1975
Reputation: 2723
Auto-correlation in pandas plotting and statsmodel graphics standardize the data before computing the auto-correlation. These libraries subtract the mean and divide by the standard deviation of the data.
When using standardization, they make an assumption that your data has been generated with a Gaussian law (with a certain mean and standard deviation). This may not be the case in reality.
Correlation is sensitive. Both (matplotlib and pandas plotting) of these functions have their drawbacks.
Figure generated by the following code using matplotlib will be identical to figure generated by pandas plotting or statsmodels graphics
dta['SUNACTIVITY_2'] = dta['SUNACTIVITY']
dta['SUNACTIVITY_2'] = (dta['SUNACTIVITY_2'] - dta['SUNACTIVITY_2'].mean()) / (dta['SUNACTIVITY_2'].std())
plt.acorr(dta['SUNACTIVITY_2'],maxlags = len(dta['SUNACTIVITY_2']) -1, linestyle = "solid", usevlines = False, marker='')
plt.show()
Source code:
Upvotes: 2