Reputation: 969
I am trying to determine the similarity between two 1D time-series using numpy.correlate
.
I wrote a small example program to learn more about how cross correlation works, however I am not completely understanding the trend in the correlation output.
Code:
import numpy as np
import matplotlib.pyplot as plt
#sample arrays to correlate
arr_1 = np.arange(1, 101) #[1, 2, 3, ..... 100]
arr_2 = np.concatenate([np.zeros(50), np.arange(50, 101)]) #[0, 0, ... 50, 51 ... 100]
cross_corr = np.correlate(arr_1, arr_2, "same")
plt.plot(list(cross_corr))
This graph raises a couple questions for me. It is my understanding that the cross correlation relies on the convolution operation (essentially the integral of the inner product of two signals - accounting for some lag).
Upvotes: 0
Views: 862
Reputation: 10590
It seems that you are confused about what is exactly being output. The documentation is a little lacking honestly. The output computes the correlation between your two arrays for each lag. The midpoint, is where the lag is 0
and where correlation is highest.
FYI, your two arrays are not the same size. arr_1
is length 100 and arr_2
is length 101. Not sure if this was intentional.
Upvotes: 1