Understanding results from 1D np.correlate

Question

I am trying to determine the similarity between two 1D time-series using numpy.correlate.

I wrote a small example program to learn more about how cross correlation works, however I am not completely understanding the trend in the correlation output.

Code:

import numpy as np
import matplotlib.pyplot as plt

#sample arrays to correlate
arr_1 = np.arange(1, 101) #[1, 2, 3, ..... 100]
arr_2 = np.concatenate([np.zeros(50), np.arange(50, 101)]) #[0, 0, ... 50, 51 ... 100]

cross_corr = np.correlate(arr_1, arr_2, "same")

plt.plot(list(cross_corr))

This graph raises a couple questions for me. It is my understanding that the cross correlation relies on the convolution operation (essentially the integral of the inner product of two signals - accounting for some lag).

Why is the correlation signal (above) steadily increases from (0, 50) if arr_2 is full of 0's from index 0 to 50?
How can I set the lag for the convolution operation. From the numpy docs I can't find a parameter which allows me to tweak the lag.
The peak at 50, is due to the fact that both signals line up at index 50, but why then does the correlation steadily decrease thereafter? If the two signals are lining up then shouldn't the correlation be increasing?
A correlation is significant only if its value is greater than 2/sqrt(n - abs(k)). Where n is the number of samples and k is the lag. How would correlation significance come into play for the graph shown above?

busybear · Accepted Answer

It seems that you are confused about what is exactly being output. The documentation is a little lacking honestly. The output computes the correlation between your two arrays for each lag. The midpoint, is where the lag is 0 and where correlation is highest.

FYI, your two arrays are not the same size. arr_1 is length 100 and arr_2 is length 101. Not sure if this was intentional.

Understanding results from 1D np.correlate

Answers (1)

Related Questions