Reputation: 905
I am about to compute the cosine similarity of two vectors in PySpark, like
1 - spatial.distance.cosine(xvec, yvec)
but scipy seems to not support the pyspark.ml.linalg.Vector type.
Upvotes: 3
Views: 9111
Reputation: 214957
You can use dot
and norm
methods to calculate this pretty easily:
from pyspark.ml.linalg import Vectors
x = Vectors.dense([1,2,3])
y = Vectors.dense([2,3,5])
1 - x.dot(y)/(x.norm(2)*y.norm(2))
# 0.0028235350472619603
With scipy:
from scipy.spatial.distance import cosine
x = np.array([1,2,3])
y = np.array([2,3,5])
cosine(x, y)
# 0.0028235350472619603
Upvotes: 10