Reputation: 7490
I have a numpy matrix say A as below
array([[1, 2, 3],
[1, 2, 2]])
I want to find the cosine similarity matrix of this a matrix where cosine similarity is between the columns.
Now cosine similarity of two vectors is just a dot product of two normalized by the L2 norm product of each
But I don't want to iterate for each column in a loop and do it.
So I first tried this:
from scipy.spatial import distance
cos=distance.cdist(a.T,a.T,'cosine')
Here I am taking transpose as else it would do cosine of rows(observations). I want for columns.
However I am not sure this is the right answer. The doc of this function says it gives 1- cosine_similarity. So should I then do?
cos-1-distance.cdist(a.T,a.T,'cosine')
Please advise.
II)
Also what If I try doing something like this:
cos=(np.dot(a.T,a))/(np.linalg.norm(a, axis=0, keepdims=True))*(np.linalg.norm(a, axis=0, keepdims=True))
It won't work as some problem in getting the right L2 norm of the right column. Any idea how we can implement this without function?
Upvotes: 0
Views: 1650
Reputation: 294258
Try this:
a = np.array([[1, 2, 3], [1, 2, 2]])
n = np.linalg.norm(a, axis=0).reshape(1, a.shape[1])
a.T.dot(a) / n.T.dot(n)
array([[ 1. , 1. , 0.98058068],
[ 1. , 1. , 0.98058068],
[ 0.98058068, 0.98058068, 1. ]])
This assignment for n
would have also worked.
np.linalg.norm(a, axis=0)[None, :]
Upvotes: 1