Reputation: 63359
I want to get a similarity percentage of two words, eg)
abcd versus zzabcdzz == 50% similarity
Don't need to be very accurate. Is there any way to do that? I am using python but feel free to recomment other languages.
Upvotes: 3
Views: 3208
Reputation: 96051
Copying from that answer:
In Python, there is difflib.
difflib
offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:
def text_compare(text1, text2, isjunk=None):
return difflib.SequenceMatcher(isjunk, text1, text2).ratio()
Upvotes: 0
Reputation: 1360
You could use the python inbuilt module difflib
Here's an example from that page
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
Upvotes: 3
Reputation: 3574
some similarity metrics from nltk library:
http://www.opendocs.net/nltk/0.9.5/api/nltk.wordnet.similarity-module.html
Upvotes: 1
Reputation: 838984
Try using python-Levenshtein
to calculate the edit distance.
The Levenshtein Python C extension module contains functions for fast computation of
- Levenshtein (edit) distance, and edit operations
- string similarity
- approximate median strings, and generally string averaging
- string sequence and set similarity
You can get a rough idea of similarity by calculating the edit distance between the two strings divided by the length of the longest string. In your example the edit distance is 4, and the maximum possible edit distance is 8, so the similarity is 50%.
Upvotes: 6