Reputation: 3008
I am trying to find a simple way to calculate soft cosine similarity between two sentences.
Here is my attempt and learning:
from gensim.matutils import softcossim
sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()
print(softcossim(sent_1, sent_2, similarity_matrix))
I'm unable to understand about similarity_matrix
. Please help me find so, and henceforth the soft cosine similarity in python.
Upvotes: 1
Views: 8314
Reputation: 647
You can use SoftCosineSimilarity class in gensim.similarities in gensim 4.0.0 upwards
from gensim.similarities import SoftCosineSimilarity
#Calculate Soft Cosine Similarity between the query and the documents.
def find_similarity(query,documents):
query = dictionary.doc2bow(query)
index = SoftCosineSimilarity(
[dictionary.doc2bow(document) for document in documents],
similarity_matrix)
similarities = index[query]
return similarities
Upvotes: 0
Reputation: 140
As of the current version of Gensim, 3.8.3, some of the method calls from both the question and previous answers have been deprecated. Those functions deprecated have been removed from the 4.0.0 beta. Can't seem to provide code in a reply to @EliadL, so adding a new comment.
The current method for solving this problem in Gensim 3.8.3 and 4.0.0 is as follows:
import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim
sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()
# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)
# Prepare the similarity matrix
similarity_index = WordEmbeddingSimilarityIndex(fasttext_model300)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)
# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)
# Compute soft cosine similarity
print(similarity_matrix.inner_product(sent_1, sent_2, normalized=True))
#> 0.68463486
For users of Gensim v. 3.8.3, I've also found this Notebook to be helpful in understanding Soft Cosine Similarity and how to apply Soft Cosine Similarity using Gensim.
As of now, for users of Gensim 4.0.0 beta this Notebook is the one to look at.
Upvotes: 3
Reputation: 7088
Going by this tutorial:
import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim
sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()
# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)
# Prepare the similarity matrix
similarity_matrix = fasttext_model300.similarity_matrix(dictionary)
# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)
# Compute soft cosine similarity
print(softcossim(sent_1, sent_2, similarity_matrix))
#> 0.7909639717134869
Upvotes: 2