Reputation: 4117

Effective way to compute cosine similarity for sparse tensors in python?

I have a list of unit tensors(length = 1). This list contains ~20 000 such tensors. Tensors have ~3 000 dimensions but are very sparse. Only x (0 < x < 1) dimensions are not 0. And I need to compute cosine similarity between all these tensors. What is the most effective way to do this? (This is not an NLP task, but my solution looks similar to word2Vect approach, that's why I have added NLP tag. My tensor has more dimensions than word2vec and it is more sparse)

Upvotes: 0

Answers (2)

fnl

Reputation: 5301

SciKit-Learn's cosine_similarity is your friend:

from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity

# example test:
T = sparse.rand(4, 3, 0.9)
cosine_similarity(T)

# full run (tensor as described in question):
T = sparse.rand(20000, 3000)
%time cosine_similarity(T)

Takes about 4.4 seconds on my machine.

# staying sparse:
%time cosine_similarity(T, dense_output=False)

Takes less than 2 seconds on my machine (i.e., around a factor 2 speedup).

Upvotes: 0

Tilak Putta

Reputation: 778

Refer below site for sklearn cosine_similarity function

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html

In python

from sklearn.metrics.pairwise import cosine_similarity
cos_sim = cosine_similarity(vector1,vector2)

Upvotes: 1

Effective way to compute cosine similarity for sparse tensors in python?

Answers (2)

Refer below site for sklearn cosine_similarity function

In python

Related Questions