Kadaj13
Kadaj13

Reputation: 1541

looking for a simplied approach for calculating pairwise correlation among arrays

I have n arrays of length m, I want to take pairwise Pearson correlation among arrays, and take average of them.

The arrays are saved as a numpy array with shape (n, m)

One way to do it is to write "two for loop operation". However, I would like to know can this be written in python in a more simplified way?

My current code looks like this:

sum_dd = 0
counter_dd = 0
for i in range(len(stc_data_roi)):
    for j in range(i+1, len(stc_data_roi)):
        sum_dd += np.corrcoef(stc_data_roi[i], stc_data_roi[j])
        counter += 1

Upvotes: 0

Views: 595

Answers (1)

dufrmbgr
dufrmbgr

Reputation: 407

Suppose you have n=4 arrays of length m=5

n = 4
m = 5
X = np.random.rand(n, m)
print(X)

array([[0.49017121, 0.58751099, 0.87868983, 0.75328938, 0.16491984],
   [0.81175397, 0.26486309, 0.42424784, 0.37485824, 0.66667452],
   [0.80901099, 0.84121723, 0.36623767, 0.59928036, 0.22773295],
   [0.59606777, 0.63301654, 0.30963807, 0.82884099, 0.95136045]])

Now transpose the array and convert to a dataframe. Each column of the dataframe represents one array and then use pandas corr function.

df = pd.DataFrame(X.T)
corr_coef = df.corr(method="pearson")
print(corr_coef)

Each column of corr_coef will represent correlation coefficient with other arrays including itself (where it will be one).

1.000000 -0.582567   0.226621    -0.709900
-0.582567   1.000000    -0.142663   0.182677
0.226621    -0.142663   1.000000    -0.173838
-0.709900   0.182677    -0.173838   1.000000

#sum of relevant coefficients as per your code
#Subtract by 4 because we don't want self correlation
#Divide by 2 becasue we are adding twice
corr_coef_sum = (corr_coef.sum().sum() - n) / 2
corr_coef_avg = corr_coef_sum / 6 #Total 6 combination in our example case

Upvotes: 1

Related Questions