Reputation: 73
The code below is very inefficient given large matrices. Is there a better way to implement this ?
I have already searched the web for this here.
import numpy as np
def cosine_similarity(x, y):
return np.dot(x, y) / (np.sqrt(np.dot(x, x)) * np.sqrt(np.dot(y, y)))
def compare(a, b):
c = np.zeros((a.shape[0], b.shape[0]))
for i, ai in enumerate(a):
for j, bj in enumerate(b):
c[i, j] = cosine_similarity(ai, bj)
return c
a = np.random.rand(100,2000)
b = np.random.rand(800,2000)
compare(a,b) # shape -> (100, 800)
Upvotes: 2
Views: 1664
Reputation: 73
[Personal edit]
In order to compute the cosine similarity efficiently, here is a solution I have written:
def compare(a, b):
x = np.atleast_2d(np.sqrt(np.sum(a*a, axis=1))).T
y = np.atleast_2d(np.sqrt(np.sum(b*b, axis=1))).T
return a.dot(b.T) / x.dot(y.T)
Upvotes: 0
Reputation: 73
As in the comments, if you want to take the product of two matrices, then numpy already has an efficient implementation of this, but it might be too slow for you (O(n^3)).
import numpy as np
a=np.array([3,2,1])
b=np.array([1,2,3])
c=a.dot(b)
print(c) #output = 10
I saw in the comments that you were interested in the cosine distance between vectors. For the cosine similarity, consider using Scipy:
from scipy.spatial.distance import cosine
a=[1,0,1]
b=[0,1,0]
print(cosine(a,b)) #output = 1.0
This might be faster for your needs. Here is the documentation.
Upvotes: 1