Malavika Venkatesh
Malavika Venkatesh

Reputation: 49

Find the similarity scores between sentences

I am trying to find the similar sentences from my data and my code gives me an output that basically ranks the similar sentences like RANK 1, 2 and 3 where Rank 1 will be the highly similar sentence. I used BM25 to find this out For example: Sentence 1: "The person is wearing a red-shirt

Rank 1 : "the boy is wearing a red shirt"

Rank 2 : "the boy is wearing a shirt"

Rank 3 : "the girl is wearing a dress"

I would also want to know the similarity score to find out how similar the sentences are. Would need help there!

Upvotes: 1

Views: 1655

Answers (1)

Rohith Nambiar
Rohith Nambiar

Reputation: 3730

You can use SequenceMatcher from difflib

from difflib import SequenceMatcher
s = SequenceMatcher(None, "the boy is wearing a red shirt", "the boy is wearing a shirt")
print(s.ratio())

Output

0.9285714285714286 # 1 being max

Or

You can use thefuzz library

fuzz.ratio("the boy is wearing a red shirt", "the boy is wearing a shirt") # 100 being max

Or

You can use jellyfish library

import jellyfish
jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish') # 2

jellyfish.jaro_distance(u'jellyfish', u'smellyfish') # 0.89629629629629

jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs') # 1

You can find most of the text similarity methods and how they are calculated under this link: https://github.com/luozhouyang/python-string-similarity#python-string-similarity

Upvotes: 4

Related Questions