Reputation: 990
Is there a way to determine similarity of given multiple text instances, maybe percentual or other way that can show how much common given text have with each other.
T1 = abcabcabc
T2 = xyzabcxyzabcxyz
T3 = abcxyzabc
Similarity would be something like:
*abc*abc* or maybe 66%
I can't be more specific at the moment.
If code is provided I prefer python but any script language or like is good, pseudo code as well or reference to problem solving sites.
Upvotes: 0
Views: 170
Reputation: 1220
There are kinds of methods to measure distance between text .
Check String metric for more detail .
And there is a implement of Levenshtein distance on pypi
, I didn't try it myself .
There is another one on wiki
While Levenshtein distance calculate the minimum step to convert one string to another , you may use step / len(string)
to get the similarity percent of two string.
Upvotes: 1