Reputation: 19
I wanted to calculate the normalized cross-correlation function of two signals where "x" axes is the time delay and "y" axes is value of correlation between -1
and 1
. so I decided to use scipy.
I use the command corr = signal.correlate(s1['Strain'], s2['Strain'], mode='full')
where s1['Strain']
and s2['Strain']
are the pandas dataframe values but it doesn't return the normalized function with "x" axes as time delay.
Here is example data
s1:
Strain
0 -1.587702e-22
1 -1.425868e-22
2 -1.174897e-22
3 -8.559119e-23
4 -4.949480e-23
. .
. .
. .
for s2
it looks similar. I knew the sampling of both datasets, it's 4096 kHz.
Thank for your help.
Upvotes: 0
Views: 14113
Reputation: 562
First of all to get normalized coefficient (such that as lag 0, we get the Pearson correlation):
out = correlate(x/np.std(x), y/np.std(y), 'full') / min(len(x), len(y))
Now for the lags, from the official documentation of correlate
one can read that the full output of cross-correlation is given by:
z[k] = (x * y)(k - N + 1)
= \sum_{l=0}^{||x||-1}x_l y_{l-k+N-1}^{*}\]
Where *
denotes the convolution, and k goes from 0 up to ||x|| + ||y|| - 2
precisely. N is max(len(x), len(y))
.
The lags are denoted above as the argument of the convolution (x * y)
, so they range from 0 - N + 1
to ||x|| + ||y|| - 2 - N + 1
which is n - 1
with n=min(len(x), len(y))
.
Also, by briefly looking at the source code, I think they swap x
and y
sometimes if convenient... (hence the min(len(x), len(y))
in the normalisation above. However this implies to change the start of our lags, therefore:
N = max(len(x), len(y))
n = min(len(x), len(y))
# if len(x) < (len(y):
lags = np.arange(-N + 1, n)
# else:
lags = np.arange(-n + 1, N)
Check this code on two time-series for which you want to plot the cross-correlation of:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import correlate
def plot_xcorr(x, y):
"Plot cross-correlation (full) between two signals."
N = max(len(x), len(y))
n = min(len(x), len(y))
if N == len(y):
lags = np.arange(-N + 1, n)
else:
lags = np.arange(-n + 1, N)
c = correlate(x / np.std(x), y / np.std(y), 'full')
plt.plot(lags, c / n)
plt.show()
Upvotes: 4
Reputation: 1
To calculate the time delay between two signals, we need to find the cross-correlation between two signals and find the argmax.
Assuming data_1
and data_2
are samples of two signals:
import numpy as np
import pandas as pd
correlation = np.correlate(data_1, data_2, mode='same')
delay = np.argmax(correlation) - int(len(correlation)/2)
Upvotes: 0