Reputation: 2189
I have a list,
a = [1,2,3]
Now I have another list of lists(which is same size as above),
x=[[1,2,3], [4,5,6], [7,8,9]]
Now I want to calculate the distance between each item in x with a them using cosine distance so I am using this,
from scipy import spatial
distances = [spatial.distance.cosine(a, i) for i in x]
Now the above method is taking very long time to execute, I am looking for alternative way to do this most efficiently.
Upvotes: 0
Views: 153
Reputation: 2723
With numpy, you can use broadcasting to do the same computation and take advantage of vectorized operations for more efficiency.
def cosine_distance(a, x):
a = np.array(a)
x = np.array(x)
return 1 - x.dot(a) / (np.linalg.norm(a) * np.linalg.norm(x, axis=1))
Using ipython's %timeit
for the example data to compare the execution time:
%timeit [spatial.distance.cosine(a, i) for i in x]
140 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit cosine_distance(a, x)
27.8 µs ± 315 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 2