Reputation: 945
Say I have a matrix mat
an 100 x 200
array.
My question is twofold:
How can I compute the cosine similarity of the first row against all the other rows? I tried using sklearn
's cosine_similarity
function but passing in a 100 x 200
matrix gives me a 100 x 100
array (instead of a 100 x 1
array).
If I wanted to compute the cosine similarities of all the rows against the others, say compute all 100 C 2 = 4950 different combinations of all the rows, would it be fastest not to use something like sklearn
but actually store the norms of each of the rows by np.linalg.norm
and then compute each similarity by cos_sim = dot(a, b)/(norm(a)*norm(b))
?
Upvotes: 0
Views: 921
Reputation: 136
1- try:
cosines = (numpy.inner(mat[0], mat) / (numpy.linalg.norm(mat[0]) * numpy.linalg.norm(mat, axis=1)))
2- you can check the previous code to do similar thing knowing that
numpy.linalg.norm(mat, axis=1)
computing the norms of all vectors and then you multiply by the current one for each step. also
numpy.inner(mat, mat)
will give you a symmetric matrix of the matrix inner product.
Upvotes: 1