Orel Vaknin
Orel Vaknin

Reputation: 35

python use of corrcoeff to achieve matlab's corr function

After I tried every solution I found online, I must ask here.

I want to achieve the behavior of matlab's corr function:
I have 2 matrices A and B.
A's shape: (200, 30000)
B's shape: (200, 1)

in matlab, corr(A, B) will return a matrix with size (30000, 1). when I use numpy.corrcoef (or dask for better performance) I get a (30001, 30001) matrix which is extremely huge, and a wrong answer. I tried using argument rowvar=False as some answer suggested, but it didnt work as well.

I even tried scipy.spatial.distance.cdist(np.transpose(traces), np.transpose(my_trace), metric='correlation') which indeed returned a matrix in shape(30000, 1) as expected but the values were differnet then the result in matlab.

I am desperate for a solution for this problem, please help.

Upvotes: 1

Views: 1736

Answers (2)

Naga Raj
Naga Raj

Reputation: 45

With the following version, you can use get Matlab's corr with python's corrcoef :

Corr = np.absolute(np.corrcoef(A.T, B.T))
Corr = Corr[0:A.shape[1],-B.shape[1]:]

Upvotes: 0

Ehsan
Ehsan

Reputation: 12407

Matlab's corr by default calculates the correlation of columns of A and B, while Python's corrcoef calculates the correlation of rows within an array(if you pass the function two arrays, it seems it will do the same with vertically stacked arrays). If you do not care about the performance and need to find an easy way to do it, you can stack two arrays horizontally and calculate correlation and get the corresponding elements you would like:

correlation = np.corrcoef(np.hstack((B,A)),rowvar=False)[0,1:]

But if you care about performance more than simple codes, you would need to implement the corr function yourself. (Please comment and I will add it if that is what you are looking for)

UPDATE: If you would like to implement corr to prevent extra calculations/memory usage, you can calculate correlation using its formula by first normalizing arrays and then multiplying them:

A = (A - A.mean(axis=0))/A.std(axis=0)
B = (B - B.mean(axis=0))/B.std(axis=0)
correlation = (np.dot(B.T, A)/B.shape[0])[0]

output of sample code:

A = np.array([1,2,2,2]).reshape(4,1)
B = np.arange(20).reshape(4,5)

Python: np.corrcoef(np.hstack((A,B)),rowvar=False)[0,1:]

[0.77459667 0.77459667 0.77459667 0.77459667 0.77459667]

Matlab:  corr(A,B)

0.7746    0.7746    0.7746    0.7746    0.7746

Upvotes: 5

Related Questions