user3338049
user3338049

Reputation: 21

Difflib ratio() unexpected results

I tried difflib get_close_matches() but I want more fine-grained control, so I wanted to do something similar myself. I looked at the source for get_close_matches() and it just obtains the ratios and returns the best one (or more).

get_close_matches() uses ratio() and picks the highest result. However, when doing the ratio() individually, the correct match (chosen by get_close_matches()) doesn't have the highest ratio.

I'm not sure if I'm using it incorrectly or what's going on.

Here is a test program showing what's going on:

import difflib

test_string = "ifthecalleridentifieshimselforherselfasinventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthathisorherassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthatheorshewillbecalledback"
test_list =  ["ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback",\
              "2ifthecalleridentifiedtheirselfasaninventorapplicantoranauthorizedrepresentativeoftheassigneeofrecordpatentdataportalshouldbeusedtoverifythecorrespondenceaddressofrecord"]


print ("Radio of string to first item: ", difflib.SequenceMatcher(None, test_string, test_list[0]).ratio())
print ("Radio of string to second item:", difflib.SequenceMatcher(None, test_string, test_list[1]).ratio())
print (difflib.get_close_matches(test_string, test_list, n=1, cutoff=0.1))

And here are the results I'm getting:

Ratio of string to first list element:  0.4924114671163575
Ratio of string to second list element: 0.5520169851380042
['ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback']

The ratio for the second element is higher, but bet_close_matches correctly returns the first element.

Upvotes: 0

Views: 36

Answers (1)

Emek
Emek

Reputation: 1

They seem to be using SequenceMatcher.quick_ratio (see this link):

s = SequenceMatcher()
s.set_seq2(word)
for x in possibilities:
    s.set_seq1(x)
    if s.real_quick_ratio() >= cutoff and \
       s.quick_ratio() >= cutoff and \
       s.ratio() >= cutoff:
       result.append((s.ratio(), x))

When I switch your code to quick ratio, I get the following:

import difflib

test_string = "ifthecalleridentifieshimselforherselfasinventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthathisorherassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthatheorshewillbecalledback"
test_list =  ["ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback",\
              "2ifthecalleridentifiedtheirselfasaninventorapplicantoranauthorizedrepresentativeoftheassigneeofrecordpatentdataportalshouldbeusedtoverifythecorrespondenceaddressofrecord"]


print ("Ratio of string to first item: ", difflib.SequenceMatcher(None, test_string, test_list[0]).quick_ratio())
print ("Ratio of string to second item:", difflib.SequenceMatcher(None, test_string, test_list[1]).quick_ratio())
print (difflib.get_close_matches(test_string, test_list, n=1, cutoff=0.1))
Ratio of string to first item:  0.9612141652613828
Ratio of string to second item: 0.7091295116772823
['ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback']

Upvotes: 0

Related Questions