Dictionay `__getitem__` multi-subscripting overriding

Question

I'm trying to implement a customized behavior of the dict data structure.

I want to override the __getitem__ and apply some sort of regex on the value before returning it to the user.

Snippet:

class RegexMatchingDict(dict):
    def __init__(self, dct, regex, value_group, replace_with_group, **kwargs):
        super().__init__(**kwargs)
        self.replace_with_group = replace_with_group
        self.value_group = value_group
        self.regex_str = regex
        self.regex_matcher = re.compile(regex)
        self.update(dct)

    def __getitem__(self, key):
        value: Union[str, dict] = dict.__getitem__(self, key)
        if type(value) is str:
            match = self.regex_matcher.match(value)
            if match:
                return value.replace(match.group(self.replace_with_group), os.getenv(match.group(self.value_group)))
        return value # I BELIEVE ISSUE IS HERE

This works perfectly for a single index level (i.e., dict[key]). However, when trying to multi-index it (i.e., dict[key1][key2]), what happens is that the first index level returns an object from my class. But, the other levels calls the default __getitem__ in dict, which does not execute my customized behavior. How can I fix this?

An MCVE:

The aforementioned code applies a regular expression to the value and convert it to its corresponding environment variable's value if it's string (i.e., the lowest level in the dict)

dictionary = {"KEY": "{ENVIRONMENT_VARIABLE}"}

custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)

Let's set an env variable called ENVIRONMENT_VARIABLE set to 1.

import os

os.environ["ENVIRONMENT_VARIABLE"] = "1"

In this case, thie code works perfectly fine

custom_dict["KEY"]

and the returned value will be:

{"KEY": 1}

However, if we had a multi-level indexing

dictionary = {"KEY": {"INDEXT_KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
custom_dict["KEY"]["INDEX_KEY"]

This would return

{ENVIRONMENT_VARIABLE}

P. S. There are many similar questions, but they all (probably) address the top-level indexing.

BoarGules · Accepted Answer

The problem, as you say yourself, is in the last line of your code.

if type(value) is str:
    ...
else:
    return value # I BELIEVE ISSUE IS HERE

This is returning a dict. But you want to return a RegexMatchingDict instead, that will know how to handle the second level of indexing. So instead of returning value if it is a dict, convert it to a RegexMatchingDict and return that instead. Then when __getitem__() is called to perform the second level of indexing, you will get your version and not the standard one.

Something like this:

return RegexMatchingDict(value, self.regex_str, self.value_group, self.replace_with_group)

This copies the other arguments from the first level since it is hard to see how the second level could be different.

Dictionay `getitem` multi-subscripting overriding

Answers (2)

Related Questions