Reputation: 49
For example, A dictionary is provided as nba_dict = {'lakers':100,'Warriors':99}
. The keys of the dict are manually input by users. And we can see there exist a typo diff. Upper case for 'Warriors' and lower case for 'lakers'. The goal is to get the value by the keys. But I want to have a function to handle these typo errors, For example, if I do nba_dict['Aker']
and it will provide me 100.
Upvotes: 1
Views: 79
Reputation: 1219
You can use Levenshtein distance to calculate the minimum number of "single edits" to change the first string into the second one. There is a convenient implementation in C with Python interface:
pip install python-Levenshtein
Then you can compare the distance between your query and all the keys, find the minimum one and return the corresponding value. You can also return an error if the minimal distance is above an empirical threshold to filter "too bad" requests.
from Levenshtein import distance as lev_dist
class FlexibleDict(dict):
def __init__(self, _threshold=3, **kwargs): # change the threshold
super().__init__(**kwargs)
self.threshold = _threshold
def __getitem__(self, query):
if query in self:
return super().__getitem__(query)
keys = list(self.keys())
min_dist, best_key = lev_dist(query, keys[0]), keys[0]
for key in keys[1:]:
dist = lev_dist(query, key)
if dist < min_dist:
min_dist, best_key = dist, key
if min_dist <= self.threshold:
return super().__getitem__(best_key)
raise ValueError(f"key '{query}' not found")
nba_dict = FlexibleDict(lakers=100, warriors=90)
# to change the threshold
# nba_dict = FlexibleDict(_threshold=2, lakers=100, warriors=90)
print(nba_dict['Aker']) # 100
print(nba_dict['gege']) # error
Except for initialization, FlexibleDict
objects can be manipulated exactly as vanilla dict
objects.
Upvotes: 1