Yas
Yas

Reputation: 901

Compute correlation coefficient between rows of two matrices

Given two matrices A and B in Python, I would like to find the correlation between the rows in two matrices. The matrices would be of length 5*7.

I would like to find the correlation between each row in A and B and average the correlations:

A  = data_All_Features_rating1000_topk_nr ;
B  = data_All_Features_rating1000_leastk_nr ;

corr_1 = corrcoeff(A[0,:],B[0,:]])
corr_2 = corrcoeff(A[0,:],B[1,:]])
corr_3 = corrcoeff(A[0,:],B[2,:]])
corr_4 = corrcoeff(A[0,:],B[3,:]])
corr_5 = corrcoeff(A[0,:],B[4,:]])

corr_6 = corrcoeff(A[1,:],B[1,:]])
corr_7 = corrcoeff(A[1,:],B[2,:]])
corr_8 = corrcoeff(A[1,:],B[3,:]])
corr_9 = corrcoeff(A[1,:],B[4,:]])

corr_10 = corrcoeff(A[2,:],B[2,:]])
corr_11 = corrcoeff(A[2,:],B[3,:]])
corr_12 = corrcoeff(A[2,:],B[4,:]])

corr_13 = corrcoeff(A[3,:],B[3,:]])
corr_14 = corrcoeff(A[3,:],B[4,:]])

corr_14 = corrcoeff(A[4,:],B[4,:]])


corravg = avg(corr_1,corr_2,...,corr_14).

This is what I do :

topk = 5 
corr_res = []
p = 0 ;
for i in range(0,topk):
    for j in range(i,topk):
        a = data_All_Features_rating1000_topk_nr[i,:]
        b = data_All_Features_rating1000_leastk_nr[j,:]
        tmp = np.corrcoef(a,b)
        print tmp[0,1]
        corr_res = corr_res.extend(tmp[0,1])  

I get this error:

     ---------------------------------------------------------------------------
     TypeError                                 Traceback (most recent call last)
     <ipython-input-159-ab1d737eed71> in <module>()
     22             tmp = np.corrcoef(a,b)
     23             print tmp[0,1]
---> 24             corr_res = corr_res.extend(tmp[0,1])
     25            # print p+1
     26            # print corr_res

     TypeError: 'numpy.float64' object is not iterable

Upvotes: 2

Views: 3191

Answers (1)

kvorobiev
kvorobiev

Reputation: 5070

Efficient way to perform matrix operations in python is using of NumPy library. Exactly for correlation calculation could be user numpy.correlate function. To calculate correlation between all combination of rows you could use

import numpy as np
A = np.array([[1, 2, 3, 4], [2, 3, 5, 6], [1,3,4,5], [7,8,2,3]])
B = np.array([[1, 2, 3, 4], [3, 5, 6, 2], [3,2,4,1], [9,8,2,1]])
corr = []
for i in xrange(len(A)):
    for j in xrange(len(B)-i):
        corr.extend(np.correlate(A[i], B[j+i]))
corr_avg = np.average(corr)
print corr_avg
print " ".join(map(str, corr))

UPDATE

Instead of

print tmp[0,1]
corr_res = corr_res.extend(tmp[0,1])

Try

print tmp[0,0]
corr_res.append(tmp[0,0])

list method extend taking an iterable object on input (like other list, tuple, ...). And if you want to add to list scalar value you should use append method.

Upvotes: 2

Related Questions