Reputation: 6115
I'm trying to implement a customized behavior of the dict
data structure.
I want to override the __getitem__
and apply some sort of regex on the value before returning it to the user.
Snippet:
class RegexMatchingDict(dict):
def __init__(self, dct, regex, value_group, replace_with_group, **kwargs):
super().__init__(**kwargs)
self.replace_with_group = replace_with_group
self.value_group = value_group
self.regex_str = regex
self.regex_matcher = re.compile(regex)
self.update(dct)
def __getitem__(self, key):
value: Union[str, dict] = dict.__getitem__(self, key)
if type(value) is str:
match = self.regex_matcher.match(value)
if match:
return value.replace(match.group(self.replace_with_group), os.getenv(match.group(self.value_group)))
return value # I BELIEVE ISSUE IS HERE
This works perfectly for a single index level (i.e., dict[key]
). However, when trying to multi-index it (i.e., dict[key1][key2]
), what happens is that the first index level returns an object from my class. But, the other levels calls the default __getitem__
in dict
, which does not execute my customized behavior. How can I fix this?
An MCVE:
The aforementioned code applies a regular expression to the value and convert it to its corresponding environment variable's value if it's string (i.e., the lowest level in the dict)
dictionary = {"KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
Let's set an env variable called ENVIRONMENT_VARIABLE
set to 1
.
import os
os.environ["ENVIRONMENT_VARIABLE"] = "1"
In this case, thie code works perfectly fine
custom_dict["KEY"]
and the returned value will be:
{"KEY": 1}
However, if we had a multi-level indexing
dictionary = {"KEY": {"INDEXT_KEY": "{ENVIRONMENT_VARIABLE}"}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
custom_dict["KEY"]["INDEX_KEY"]
This would return
{ENVIRONMENT_VARIABLE}
P. S. There are many similar questions, but they all (probably) address the top-level indexing.
Upvotes: 0
Views: 51
Reputation: 16968
In your example, your second level dictionary is a normal dict
and therefore doesn't use your custom __getitem__
method.
The code below shows what should be done to have an internal custom dict
:
sec_level_dict = {"KEY": "{ENVIRONMENT_VARIABLE}"}
sec_level_custom_dict = RegexMatchingDict(sec_level_dict, r"((.*({(.+)}).*))", 4 ,3)
dictionary = {"KEY": sec_level_custom_dict}
custom_dict = RegexMatchingDict(dictionary, r"((.*({(.+)}).*))", 4 ,3)
print(custom_dict["KEY"]["KEY"])
If you want to automate this and transform all nested dict
in custom dict
, you can customize __setitem__
following this pattern:
class CustomDict(dict):
def __init__(self, dct):
super().__init__()
for k, v in dct.items():
self[k] = v
def __getitem__(self, key):
value = dict.__getitem__(self, key)
print("Dictionary:", self, "key:", key, "value:", value)
return value
def __setitem__(self, key, value):
if isinstance(value, dict):
dict.__setitem__(self, key, self.__class__(value))
else:
dict.__setitem__(self, key, value)
a = CustomDict({'k': {'k': "This is my nested value"}})
print(a['k']['k'])
Upvotes: 0
Reputation: 16952
The problem, as you say yourself, is in the last line of your code.
if type(value) is str:
...
else:
return value # I BELIEVE ISSUE IS HERE
This is returning a dict
. But you want to return a RegexMatchingDict
instead, that will know how to handle the second level of indexing. So instead of returning value
if it is a dict
, convert it to a RegexMatchingDict
and return that instead. Then when __getitem__()
is called to perform the second level of indexing, you will get your version and not the standard one.
Something like this:
return RegexMatchingDict(value, self.regex_str, self.value_group, self.replace_with_group)
This copies the other arguments from the first level since it is hard to see how the second level could be different.
Upvotes: 1