redrubia
redrubia

Reputation: 2366

Regex matching on dictionary keys

Imagine we have a dictionary: {'Hello World': value1, 'Testing': value2}

Now we need to look up a word in the dictionary. The key K will need to exactly match 'Hello World' or 'Testing', to use.

So let our text = 'hello world' we still want this to return value1

So how do we handle this regex matching of text to keys? Ideally we don't want to iterate through the dictionary

Edit: Spacing aspect is just a simple example. The text may change in case, be a combination of numbers and letters we want to match. We would usually use a regex pattern

Upvotes: 5

Views: 5232

Answers (4)

klob
klob

Reputation: 106

Perhaps consider normalizing the keys in your dict. Using python's string.split function with no delimiter will remove all whitespace.

def normalize(word):
    return " ".join(word.split()).lower()
d = {'Hello World': 'value1', 'Testing': 'value2'}
d = {normalize(k): v for k, v in d.items()} 
print(d)
>>> {'hello world': 'value1', 'testing': 'value2'}
text = 'hello     world'
d[normalize(text)]
>>> 'value1'

Upvotes: 0

glibdud
glibdud

Reputation: 7850

What you're doing is pretty much defeating the efficiency of dicts, so you're probably better off making your own dict-like class. Here's a simple example:

from re import search, I

class RegexMap(object):
    def __init__(self, *args, **kwargs):
        self._items = dict(*args, **kwargs)
    def __getitem__(self, key):
        for regex in self._items.keys():
            if search(regex, key, I):
                return self._items[regex]
        raise KeyError

Usage:

>>> rm = RegexMap({'\s*hello\s*world\s*':1, '\s*foo\s*bar\s*':2})
>>> rm['Hello World']
1
>>> rm['foobar']
2
>>> rm['baz']
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    rm['baz']
  File "C:\Users\dmurphy\Documents\python\_t.py", line 10, in __getitem__
    raise KeyError
KeyError
>>> 

From there, you can add more dict functionality. See the Data Model docs.

It does break your "no iteration" clause, but I'm not sure there's any way around that if you want to generalize to regexes.

Upvotes: 4

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

To help the lookup you could either sort and bisect to find where to start looking to narrow down the lookups breaking when you find a match or the current key is > that what you are looking fo.

from bisect import bisect_left

d = {'Hello World': "value1", 'Testing': "value2"}

items = sorted([(k.lstrip().lower(),v) for k, v in d.items()])

text = 'hello     world'
ind = bisect_left(items,(text.lower(),), hi=len(items) - 1)
# use items[ind]

Or create a mapping using the first couple of letters of each key:

from collections import defaultdict
lookup_mapping = defaultdict(list)

for k in d:
    lookup_mapping[k[:2].lower().lstrip()].append(k)

poss =  lookup_mapping[text[:2].lower().lstrip()]

You can either use a regex to find the match or normalize the data by slitting, stripping etc.. it completely depends on what the input can look like but by grouping you can at least reduce the amount of comparisons you have to make.

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174716

I would do like,

>>> d = {'Hello World': 'value1', 'Testing': 'value2'}
>>> text = 'hello     world'
>>> key = next(x for x in (re.search(r'(?i)' + re.sub(r'(\s)+', r'\1', text.strip()), i) for i in d.keys()) if x).group()
>>> d[key]
'value1'

Upvotes: 0

Related Questions