Determining similarity between multiple text blocks

Question

Is there a way to determine similarity of given multiple text instances, maybe percentual or other way that can show how much common given text have with each other.

T1 = abcabcabc
T2 = xyzabcxyzabcxyz
T3 = abcxyzabc

Similarity would be something like:

*abc*abc* or maybe 66%

I can't be more specific at the moment.

If code is provided I prefer python but any script language or like is good, pseudo code as well or reference to problem solving sites.

KIDJourney · Accepted Answer

There are kinds of methods to measure distance between text .

Check String metric for more detail .

And there is a implement of Levenshtein distance on pypi , I didn't try it myself .

There is another one on wiki

While Levenshtein distance calculate the minimum step to convert one string to another , you may use step / len(string) to get the similarity percent of two string.

Determining similarity between multiple text blocks

Answers (1)

Related Questions