Reputation: 13329
I have a string that is the correct spelling of a word:
FOO
I would allow someine to mistype the word in such ways:
FO, F00, F0O ,FO0
Is there a nice way to check for this ? Lower case should also be seen as correct, or convert to upper case. What ever would be the prettiest.
Upvotes: 2
Views: 927
Reputation: 1469
you can use the 're' module
re.compile(r'f(o|0)+',re.I) #ignore case
you can use curly braces to limit the number of occurrences too. you can also get 'fancy' and define your 'leet' sets and add them in w/ %s
as in:
ay = '(a|4|$)'
oh = '(o,0,\))'
re.compile(r'f%s+' % (oh),re.I)
Upvotes: 1
Reputation: 67113
The builtin module difflib has a get_close_matches function.
You can use it like this:
>>> import difflib
>>> difflib.get_close_matches('FO', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'])
[]
>>> difflib.get_close_matches('F0O', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('FO0', ['FOO', 'BAR', 'BAZ'])
['FOO']
Notice that it doesn't match one of your cases. You could lower the cutoff
parameter to get a match:
>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'], cutoff=0.3)
['FOO']
Upvotes: 2
Reputation: 838806
One approach is to calculate the edit distance between the strings. You can for example use the Levenshtein distance, or invent your own distance function that considers 0 and O more close than 0 and P, for example.
Another is to transform each word into a canonical form, and compare canonical forms. You can for example convert the string to uppercase, replace all 0s with Os, 1s with Is, etc., then remove duplicated letters.
>>> import itertools
>>> def canonical_form(s):
s = s.upper()
s = s.replace('0', 'O')
s = s.replace('1', 'I')
s = ''.join(k for k, g in itertools.groupby(s))
return s
>>> canonical_form('FO')
'FO'
>>> canonical_form('F00')
'FO'
>>> canonical_form('F0O')
'FO'
Upvotes: 6