Reputation: 219
I want to calculate auto-covariance of 3 arrays X1, X2 and Y which are all stationary random process. Is there any function in sciPy or other library can solve this problem?
Upvotes: 19
Views: 20288
Reputation: 823
@user333700 has the right answer. Using a library (such as statsmodels) is generally preferred over writing your own. However, it is insightful to implement your own at least once.
def _check_autocovariance_input(x):
if len(x) < 2:
raise ValueError('Need at least two elements to calculate autocovariance')
def get_autocovariance_given_lag(x, lag):
_check_autocovariance_input(x)
x_centered = x - np.mean(x)
a = np.pad(x_centered, pad_width=(0, lag), mode='constant')
b = np.pad(x_centered, pad_width=(lag, 0), mode='constant')
return np.dot(a, b) / len(x)
def get_autocovariance(x):
_check_autocovariance_input(x)
x_centered = x - np.mean(x)
return np.correlate(x_centered, x_centered, mode='full')[len(x) - 1:] / len(x)
The function I have get_autocovariance_given_lag
calculates the autocovariance for a given lag.
If you are interested in all lags, the get_autocovariance
can be used. The np.correlate
function is what statsmodels
uses under the hood. It calculates the cross correlation. This is a sliding dot product. For example, suppose the array is [1, 2, 3]. Then we get:
[1, 2, 3] = 3 * 1 = 3
[1, 2, 3]
[1, 2, 3] = 2 * 1 + 3 * 2 = 8
[1, 2, 3]
[1, 2, 3] = 1 * 1 + 2 * 2 + 3 * 3 = 14
[1, 2, 3]
[1, 2, 3] = 2 * 1 + 3 * 2 = 8
[1, 2, 3]
[1, 2, 3] = 3 * 1 = 3
[1, 2, 3]
But note we are interested in the covariance that starts at lag 0. Where is this? Well, this occurs after we have moved N - 1
positions to the right where N
is the length of the array. This is why we return the array starting at N-1
.
Upvotes: 1
Reputation: 415
A small tweak to the previous answers, which avoids python for
loops and uses numpy array operations instead. This will be quicker if you have a lot of data.
def lagged_auto_cov(Xi,t):
"""
for series of values x_i, length N, compute empirical auto-cov with lag t
defined: 1/(N-1) * \sum_{i=0}^{N-t} ( x_i - x_s ) * ( x_{i+t} - x_s )
"""
N = len(Xi)
# use sample mean estimate from whole series
Xs = np.mean(Xi)
# construct copies of series shifted relative to each other,
# with mean subtracted from values
end_padded_series = np.zeros(N+t)
end_padded_series[:N] = Xi - Xs
start_padded_series = np.zeros(N+t)
start_padded_series[t:] = Xi - Xs
auto_cov = 1./(N-1) * np.sum( start_padded_series*end_padded_series )
return auto_cov
Comparing this against @bluevoxel's code, using a time-series of 50,000 data points and computing the auto-correlation for a single fixed value of lag, the python for
loop code averaged about 30 milli-seconds and using numpy arrays averaged faster than 0.3 milli-seconds (running on my laptop).
Upvotes: 5
Reputation: 21
Get sample auto covariance:
# cov_auto_samp(X,delta)/cov_auto_samp(X,0) = auto correlation
def cov_auto_samp(X,delta):
N = len(X)
Xs = np.average(X)
autoCov = 0.0
times = 0.0
for i in np.arange(0, N-delta):
autoCov += (X[i+delta]-Xs)*(X[i]-Xs)
times +=1
return autoCov/times
Upvotes: 2
Reputation: 5358
According to the standard estimation of the autocovariance coefficient for discrete signals, which can be expressed by equation:
...where x(i)
is a given signal (i.e specific 1D vector), k
stands for the shift of x(i)
signal by k
samples, N
is the length of x(i)
signal, and:
...which is simple average, we can write:
'''
Calculate the autocovarriance coefficient.
'''
import numpy as np
Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)
def autocovariance(Xi, N, k, Xs):
autoCov = 0
for i in np.arange(0, N-k):
autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
return (1/(N-1))*autoCov
print("Autocovariance:", autocovariance(Xi, N, k, Xs))
If you would like to normalize the autocovariance coefficient, which will become the autocorrelation coefficient expressed as:
...than you just have to add to the above code just two additional lines:
def autocorrelation():
return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)
Here is full script:
'''
Calculate the autocovarriance and autocorrelation coefficients.
'''
import numpy as np
Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)
def autocovariance(Xi, N, k, Xs):
autoCov = 0
for i in np.arange(0, N-k):
autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
return (1/(N-1))*autoCov
def autocorrelation():
return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)
print("Autocovariance:", autocovariance(Xi, N, k, Xs))
print("Autocorrelation:", autocorrelation())
Upvotes: 8