Reputation: 159
I want to compare the similarity in some texts to detect duplicates, but if i use difflib, it returns different ratios depending on the order i give the data.
Some random example ....
Thanks
import difflib
a='josephpFRANCES'
b='ABswazdfsadSASAASASASAS'
seq=difflib.SequenceMatcher(None,a,b)
d=seq.ratio()*100
print(d)
seq2=difflib.SequenceMatcher(None,b,a)
d2=seq2.ratio()*100
print(d2)
d = 16.216216216216218
d2 = 10.81081081081081
Upvotes: 0
Views: 204
Reputation: 107075
A diff ratio between a
and b
is done on the basis of "how much of b
is different from a
versus the length of a
", so swapping a
and b
naturally yields different results. This is akin to "5 is 25% greater than 4" versus "4 is 20% less than 5". In your example, a
is much shorter than b
, so despite the same amount of difference between a
and b
, when the divisor is different due to the subject of the comparison being different, the diff ratio is different.
Upvotes: 1