ndrwnaguib
ndrwnaguib

Reputation: 6115

Dictionay `__getitem__` multi-subscripting overriding

I'm trying to implement a customized behavior of the dict data structure.

I want to override the __getitem__ and apply some sort of regex on the value before returning it to the user.

Snippet:

class RegexMatchingDict(dict):
    def __init__(self, dct, regex, value_group, replace_with_group, **kwargs):
        super().__init__(**kwargs)
        self.replace_with_group = replace_with_group
        self.value_group = value_group
        self.regex_str = regex
        self.regex_matcher = re.compile(regex)
        self.update(dct)

    def __getitem__(self, key):
        value: Union[str, dict] = dict.__getitem__(self, key)
        if type(value) is str:
            match = self.regex_matcher.match(value)
            if match:
                return value.replace(match.group(self.replace_with_group), os.getenv(match.group(self.value_group)))
        return value # I BELIEVE ISSUE IS HERE

This works perfectly for a single index level (i.e., dict[key]). However, when trying to multi-index it (i.e., dict[key1][key2]), what happens is that the first index level returns an object from my class. But, the other levels calls the default __getitem__ in dict, which does not execute my customized behavior. How can I fix this?


An MCVE:

The aforementioned code applies a regular expression to the value and convert it to its corresponding environment variable's value if it's string (i.e., the lowest level in the dict)

dictionary = {"KEY": "{ENVIRONMENT_VARIABLE}"}

custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)

Let's set an env variable called ENVIRONMENT_VARIABLE set to 1.

import os

os.environ["ENVIRONMENT_VARIABLE"] = "1"

In this case, thie code works perfectly fine

custom_dict["KEY"]

and the returned value will be:

{"KEY": 1}

However, if we had a multi-level indexing

dictionary = {"KEY": {"INDEXT_KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
custom_dict["KEY"]["INDEX_KEY"]

This would return

{ENVIRONMENT_VARIABLE}

P. S. There are many similar questions, but they all (probably) address the top-level indexing.

Upvotes: 0

Views: 51

Answers (2)

Jacques Gaudin
Jacques Gaudin

Reputation: 16968

In your example, your second level dictionary is a normal dict and therefore doesn't use your custom __getitem__ method.

The code below shows what should be done to have an internal custom dict:

sec_level_dict = {"KEY": "{ENVIRONMENT_VARIABLE}"}

sec_level_custom_dict = RegexMatchingDict(sec_level_dict, r"((.*({(.+)}).*))", 4 ,3)

dictionary = {"KEY": sec_level_custom_dict}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
print(custom_dict["KEY"]["KEY"])

If you want to automate this and transform all nested dict in custom dict, you can customize __setitem__ following this pattern:

class CustomDict(dict):

    def __init__(self, dct):
        super().__init__()
        for k, v in dct.items():
            self[k] = v

    def __getitem__(self, key):
        value = dict.__getitem__(self, key)
        print("Dictionary:", self, "key:", key, "value:", value)
        return value

    def __setitem__(self, key, value):
        if isinstance(value, dict):
            dict.__setitem__(self, key, self.__class__(value))
        else:
            dict.__setitem__(self, key, value)

a = CustomDict({'k': {'k': "This is my nested value"}})

print(a['k']['k'])

Upvotes: 0

BoarGules
BoarGules

Reputation: 16952

The problem, as you say yourself, is in the last line of your code.

if type(value) is str:
    ...
else:
    return value # I BELIEVE ISSUE IS HERE

This is returning a dict. But you want to return a RegexMatchingDict instead, that will know how to handle the second level of indexing. So instead of returning value if it is a dict, convert it to a RegexMatchingDict and return that instead. Then when __getitem__() is called to perform the second level of indexing, you will get your version and not the standard one.

Something like this:

return RegexMatchingDict(value, self.regex_str, self.value_group, self.replace_with_group)

This copies the other arguments from the first level since it is hard to see how the second level could be different.

Upvotes: 1

Related Questions