Reputation: 4117
I have a list of unit tensors(length = 1). This list contains ~20 000 such tensors. Tensors have ~3 000 dimensions but are very sparse. Only x (0 < x < 1) dimensions are not 0. And I need to compute cosine similarity between all these tensors. What is the most effective way to do this? (This is not an NLP task, but my solution looks similar to word2Vect approach, that's why I have added NLP tag. My tensor has more dimensions than word2vec and it is more sparse)
Upvotes: 0
Views: 820
Reputation: 5301
SciKit-Learn's cosine_similarity
is your friend:
from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity
# example test:
T = sparse.rand(4, 3, 0.9)
cosine_similarity(T)
# full run (tensor as described in question):
T = sparse.rand(20000, 3000)
%time cosine_similarity(T)
Takes about 4.4 seconds on my machine.
# staying sparse:
%time cosine_similarity(T, dense_output=False)
Takes less than 2 seconds on my machine (i.e., around a factor 2 speedup).
Upvotes: 0
Reputation: 778
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
from sklearn.metrics.pairwise import cosine_similarity
cos_sim = cosine_similarity(vector1,vector2)
Upvotes: 1