Reputation: 2366
Imagine we have a dictionary: {'Hello World': value1, 'Testing': value2}
Now we need to look up a word in the dictionary. The key K will need to exactly match 'Hello World' or 'Testing', to use.
So let our text = 'hello world'
we still want this to return value1
So how do we handle this regex matching of text to keys? Ideally we don't want to iterate through the dictionary
Edit: Spacing aspect is just a simple example. The text may change in case, be a combination of numbers and letters we want to match. We would usually use a regex pattern
Upvotes: 5
Views: 5232
Reputation: 106
Perhaps consider normalizing the keys in your dict. Using python's string.split function with no delimiter will remove all whitespace.
def normalize(word):
return " ".join(word.split()).lower()
d = {'Hello World': 'value1', 'Testing': 'value2'}
d = {normalize(k): v for k, v in d.items()}
print(d)
>>> {'hello world': 'value1', 'testing': 'value2'}
text = 'hello world'
d[normalize(text)]
>>> 'value1'
Upvotes: 0
Reputation: 7850
What you're doing is pretty much defeating the efficiency of dict
s, so you're probably better off making your own dict
-like class. Here's a simple example:
from re import search, I
class RegexMap(object):
def __init__(self, *args, **kwargs):
self._items = dict(*args, **kwargs)
def __getitem__(self, key):
for regex in self._items.keys():
if search(regex, key, I):
return self._items[regex]
raise KeyError
Usage:
>>> rm = RegexMap({'\s*hello\s*world\s*':1, '\s*foo\s*bar\s*':2})
>>> rm['Hello World']
1
>>> rm['foobar']
2
>>> rm['baz']
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
rm['baz']
File "C:\Users\dmurphy\Documents\python\_t.py", line 10, in __getitem__
raise KeyError
KeyError
>>>
From there, you can add more dict
functionality. See the Data Model docs.
It does break your "no iteration" clause, but I'm not sure there's any way around that if you want to generalize to regexes.
Upvotes: 4
Reputation: 180441
To help the lookup you could either sort and bisect to find where to start looking to narrow down the lookups breaking when you find a match or the current key is > that what you are looking fo.
from bisect import bisect_left
d = {'Hello World': "value1", 'Testing': "value2"}
items = sorted([(k.lstrip().lower(),v) for k, v in d.items()])
text = 'hello world'
ind = bisect_left(items,(text.lower(),), hi=len(items) - 1)
# use items[ind]
Or create a mapping using the first couple of letters of each key:
from collections import defaultdict
lookup_mapping = defaultdict(list)
for k in d:
lookup_mapping[k[:2].lower().lstrip()].append(k)
poss = lookup_mapping[text[:2].lower().lstrip()]
You can either use a regex to find the match or normalize the data by slitting, stripping etc.. it completely depends on what the input can look like but by grouping you can at least reduce the amount of comparisons you have to make.
Upvotes: 0
Reputation: 174716
I would do like,
>>> d = {'Hello World': 'value1', 'Testing': 'value2'}
>>> text = 'hello world'
>>> key = next(x for x in (re.search(r'(?i)' + re.sub(r'(\s)+', r'\1', text.strip()), i) for i in d.keys()) if x).group()
>>> d[key]
'value1'
Upvotes: 0