Shalini Baranwal
Shalini Baranwal

Reputation: 3008

Soft Cosine Similarity between two sentences

I am trying to find a simple way to calculate soft cosine similarity between two sentences.

Here is my attempt and learning:

from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

print(softcossim(sent_1, sent_2, similarity_matrix))

I'm unable to understand about similarity_matrix. Please help me find so, and henceforth the soft cosine similarity in python.

Upvotes: 1

Views: 8314

Answers (3)

Asantha Thilina
Asantha Thilina

Reputation: 647

You can use SoftCosineSimilarity class in gensim.similarities in gensim 4.0.0 upwards

from gensim.similarities import SoftCosineSimilarity
#Calculate Soft Cosine Similarity between the query and the documents.
def find_similarity(query,documents):
  query = dictionary.doc2bow(query)
  index = SoftCosineSimilarity(
    [dictionary.doc2bow(document) for document in documents],
    similarity_matrix)
  similarities = index[query]
  return similarities

Upvotes: 0

magiclantern
magiclantern

Reputation: 140

As of the current version of Gensim, 3.8.3, some of the method calls from both the question and previous answers have been deprecated. Those functions deprecated have been removed from the 4.0.0 beta. Can't seem to provide code in a reply to @EliadL, so adding a new comment.

The current method for solving this problem in Gensim 3.8.3 and 4.0.0 is as follows:

import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_index = WordEmbeddingSimilarityIndex(fasttext_model300)
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(similarity_matrix.inner_product(sent_1, sent_2, normalized=True))
#> 0.68463486

For users of Gensim v. 3.8.3, I've also found this Notebook to be helpful in understanding Soft Cosine Similarity and how to apply Soft Cosine Similarity using Gensim.

As of now, for users of Gensim 4.0.0 beta this Notebook is the one to look at.

Upvotes: 3

EliadL
EliadL

Reputation: 7088

Going by this tutorial:

import gensim.downloader as api
from gensim import corpora
from gensim.matutils import softcossim

sent_1 = 'Dravid is a cricket player and a opening batsman'.split()
sent_2 = 'Leo is a cricket player too He is a batsman,baller and keeper'.split()

# Download the FastText model
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')

# Prepare a dictionary and a corpus.
documents = [sent_1, sent_2]
dictionary = corpora.Dictionary(documents)

# Prepare the similarity matrix
similarity_matrix = fasttext_model300.similarity_matrix(dictionary)

# Convert the sentences into bag-of-words vectors.
sent_1 = dictionary.doc2bow(sent_1)
sent_2 = dictionary.doc2bow(sent_2)

# Compute soft cosine similarity
print(softcossim(sent_1, sent_2, similarity_matrix))
#> 0.7909639717134869

Upvotes: 2

Related Questions