Comparing vectors from two groups with permutation test and cosine similarity

Question

I would like to compare different vectors (1 per subject) between two groups. What I want to do is similar to the work performed in this paper.

https://www.pnas.org/content/early/2020/08/14/2003181117 Figure 2B.

So, I have already a list of normalized vectors for each group such as:

X = array([[0.8081178 , 0.1618492 , 1.        , 0.        , 0.52503616],
       [0.9155495 , 0.9229482 , 0.55023754, 0.        , 1.        ],
       [0.5497678 , 1.        , 0.5295068 , 0.        , 0.9580641 ],
       [0.8554752 , 0.        , 1.        , 0.27967405, 0.43231127],
       [0.8771384 , 0.15983552, 1.        , 0.24160399, 0.        ],
       [1.        , 0.        , 0.34030336, 0.8518671 , 0.14370875],
       [0.96829957, 0.89825296, 0.9989327 , 0.        , 1.        ],
       [0.19713035, 1.        , 0.8313886 , 0.        , 0.69545555],
       [1.        , 0.        , 0.15145707, 0.62412727, 0.19574052],
       [1.        , 0.        , 0.6768882 , 0.3267132 , 0.53155863],
       [0.        , 0.11568664, 1.        , 0.06043369, 0.2405336 ],
       [1.        , 0.7901962 , 0.55479664, 0.        , 0.21075204],
       [0.8389194 , 0.9723087 , 0.9122212 , 0.        , 1.        ],
       [1.        , 0.        , 0.74783736, 0.27481842, 0.54764044],
       [0.7932238 , 0.78063756, 1.        , 0.        , 0.76313186],
       [0.        , 0.28478605, 1.        , 0.48485696, 0.5902692 ]])

Y = array([[1.        , 0.8730191 , 0.72493815, 0.        , 0.9373017 ],
       [1.        , 0.8563728 , 0.71862656, 0.        , 0.74088454],
       [0.878855  , 0.8799178 , 1.        , 0.        , 0.8985272 ],
       [0.94998175, 0.924029  , 0.74815565, 0.        , 1.        ],
       [1.        , 0.4086177 , 0.3750266 , 0.        , 0.87822354],
       [0.85906726, 1.        , 0.37570593, 0.        , 0.9324212 ],
       [0.8055762 , 1.        , 0.85996395, 0.        , 0.9541106 ],
       [0.96801126, 1.        , 0.72156   , 0.        , 0.8689768 ],
       [1.        , 0.9446373 , 0.5445604 , 0.        , 0.56854314],
       [0.86714363, 1.        , 0.6032697 , 0.        , 0.7075365 ],
       [1.        , 0.8875634 , 0.8770225 , 0.        , 0.8542803 ],
       [1.        , 0.93619907, 0.8262237 , 0.        , 0.87035996],
       [1.        , 0.8533749 , 0.8739984 , 0.        , 0.97969407],
       [1.        , 0.63581806, 0.7951289 , 0.        , 0.88310444],
       [0.82491845, 1.        , 0.6478972 , 0.        , 0.8846024 ],
       [1.        , 0.79563105, 0.55089736, 0.        , 0.90971696]])

I would like to perform a permutation test of the spatial distance (cosine similarity) of the average group vector. The purpose of that is two identify if the vectors of each group (X, Y) can be considered different or not. I already know how to calculated the spatial distance ex :

from scipy import spatial
Cosin = spatial.distance.cosine(np.mean(X, axis=0)

However, what they have done in this paper is first: randomly divided the vector into two groups second: calculate the spatial distance third: test wether if their cosine value is different from random (with a permutation test)

I don't know how to integrate that into sklearn.model_selection.permutation_test_score if this is the adapted permutation test?

Also, I found http://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/ but in their function, X and Y can't have different shapes...

I may have a solution based on: https://stats.stackexchange.com/questions/330540/how-to-interpret-very-low-similarity-score-of-two-vectors-but-having-significant

import sys
import math, random
from scipy import stats


similarity = lambda x1, x2: sum(xj*xk for xj,xk in zip(x1, x2))/math.sqrt(sum(xj**2 for xj in x1)*sum(xk**2 for xk in x2))

x1 = np.mean(X, axis=0)
x2 = np.mean(Y, axis=0)

s = similarity(x1, x2)

## permutation test
sr = []
for j in list(range(1,10000)):
    concat_arrays = np.concatenate((X, Y), axis=0)
    np.random.shuffle(concat_arrays)
    #put the number of indiv mac or lemur or human
    split = np.split(concat_arrays, [len(x)])
    sr.append(similarity(np.mean(split[0], axis=0), np.mean(split[1], axis=0)))
shape, loc, scale = stats.weibull_min.fit(sr)
## -log10(p)
ej = ((s-loc)/scale)**shape*math.log10(math.exp(1.))
p = 10**(-ej)

What do you think about this proposition? For "len(x)", I don't know if I am supposed to have the same shape as the original array of my two groups?

Comparing vectors from two groups with permutation test and cosine similarity

Answers (1)

Related Questions