Reputation: 1864
I'm trying to understand how cross-correlation is used determine the similarity of two signals. This tutorial offers a very clear explanation of the basics, but I still don't understand how to use normalization effectively to prevent strong signals from dominating the cross-correlation measure when you have signals with different energy levels. The same tutor, David Dorran, discusses the issue of normalization here, and explains how to normalize the correlation using the dot product, but I still have some questions.
I wrote this python routine to do cross-correlations between every pair of signals from a group of signals:
import numpy as np
import pandas as pd
def mycorrelate2d(df, normalized=False):
# initialize cross correlation matrix with zeros
ccm = np.zeros(shape=df.shape, dtype=list)
for i, row_dict1 in enumerate(
df.to_dict(orient='records')):
outer_row = list(row_dict1.values())
for j, row_dict2 in enumerate(
df.to_dict(orient='records')):
inner_row = list(row_dict2.values())
x = np.correlate(inner_row, outer_row)
if normalized:
n = np.dot(inner_row, outer_row)
x = x / n
ccm[i][j] = x
return ccm
Suppose I have 3 signals of increasing magnitude: [1, 2, 3], [4, 5, 6] and [7, 8, 9]
I want to cross-correlate these three signals to see which pairs are similar, but when I pass these 3 signals into the routine I wrote, I don't appear to get a measure of similarity. The size of the cross correlation values is just a function of the energy signal. Period. Even the cross correlation of a signal with itself yields lower values than the cross correlation of that same signal with another signal of higher energy.
df_x3 = pd.DataFrame(
np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]).reshape(3, -1))
mycorrelate2d(df_x3)
This yields:
array([[array([ 3, 8, 14, 8, 3]),
array([12, 23, 32, 17, 6]),
array([21, 38, 50, 26, 9])],
[array([ 6, 17, 32, 23, 12]),
array([24, 50, 77, 50, 24]),
array([ 42, 83, 122, 77, 36])],
[array([ 9, 26, 50, 38, 21]),
array([ 36, 77, 122, 83, 42]),
array([ 63, 128, 194, 128, 63])]], dtype=object)
Now, I pass in the same 3 signals, but this time I indicate that I want normalized results:
mycorrelate2d(df_x3, normalized=True)
This yields:
array([[array([ 0.2142, 0.5714, 1., 0.5714, 0.2142]),
array([ 0.375, 0.71875, 1., 0.5312, 0.1875]),
array([ 0.42, 0.76, 1., 0.52, 0.18])],
[array([ 0.1875, 0.5312, 1., 0.7187, 0.375]),
array([ 0.3116, 0.6493, 1., 0.6493, 0.3116]),
array([ 0.3442, 0.6803, 1., 0.6311, 0.2950])],
[array([ 0.18, 0.52, 1., 0.76, 0.42]),
array([ 0.2950, 0.6311, 1., 0.6803, 0.3442]),
array([ 0.3247, 0.6597, 1., 0.6597, 0.3247])]],
dtype=object)
All the max values are now 1!! So we went from having max values that were based on spurious differences to having no difference between the max values at all! I readily confess that I do not understand how cross-correlation is used to detect similarity between signals. What is the analytical work flow of someone comparing signals with cross-correlation?
Upvotes: 2
Views: 2620
Reputation: 469
Take a look at Compute Normalized Cross-Correlation in Python
So the formula you are using for normaliation is not quite correct. The normalization happens before we correlate in NCC and then we divide the answer by vector length as shown in this Wikipedia formula https://en.wikipedia.org/wiki/Cross-correlation#Zero-normalized_cross-correlation_(ZNCC)
So you need something like
import numpy as np
def mycorrelate2d(df, normalized=False):
# initialize cross correlation matrix with zeros
ccm = np.zeros((3,3))
for i in range(3):
outer_row = df[i][:]
for j in range(3):
inner_row = df[j][:]
if(not normalized):
x = np.correlate(inner_row, outer_row)
else:
a = (inner_row-np.mean(inner_row))/(np.std(inner_row)*len(inner_row))
#print(a)
b = (outer_row-np.mean(outer_row))/(np.std(outer_row))
#print(b)
x = np.correlate(a,b)
ccm[i][j] = x
return ccm
df_x3 =np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]).reshape(3, -1)
print(mycorrelate2d(df_x3,True))
df_x3 =np.array([[1, 2, 3],
[9, 5, 6],
[74, 8, 9]]).reshape(3, -1)
print(mycorrelate2d(df_x3,True))
The output is
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[ 1. -0.72057669 -0.85933941]
[-0.72057669 1. 0.97381599]
[-0.85933941 0.97381599 1. ]]
Upvotes: 3