Stefan
Stefan

Reputation: 9329

How does BestMatch findall decide how many results to return?

I'm struggling to predict how many fuzzy matches findall() will return when using regex in Python with BESTMATCH enabled:

>>> regex.findall(r'(?b)(North\ West){i<=0,s<=2,d<=1}', "South west South West North West", regex.V1)
['North West']

Does not match South West at all

>>> regex.findall(r'(?b)(North\ West){i<=0,s<=2,d<=1}', "South west South West North West North west South West", regex.V1)
['North West', 'North west', 'South West']

Matches South West

I'm not clear if this is a bug or as intended?

Upvotes: 0

Views: 45

Answers (1)

Stefan
Stefan

Reputation: 9329

I think I have a partial explanation:

The behaviour appears to be:

  • Return all perfect matches (with no substitutions, insertions or deletions)
  • Return all imperfect matches AFTER the last perfect match

It appears (through testing), that BESTMATCH behaves like a normal search, except if it finds a perfect match, it throws away any preceding imperfect matches. Hence the behaviour that it returns a series of perfect matches, followed by zero or more imperfect matches.

Some examples:

>>> regex.findall(r'(?b)(abc){i<=0,s<=2,d<=1}', "abc abd abd aaa abc", regex.V1)
['abc', 'abc']
>>> regex.findall(r'(?b)(abc){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb", regex.V1)
['abc', 'abc', 'abb']
>>> regex.findall(r'(?b)(ab[cd]){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb", regex.V1)
['abc', 'abd', 'abd', 'abc', 'abb']
>>> regex.findall(r'(?b)(ab[cd]){i<=0,s<=2,d<=1}', "abc abd abd aaa abc abb abc", regex.V1)
['abc', 'abd', 'abd', 'abc', 'abc']

Upvotes: 0

Related Questions