Reputation: 83
I am creating a nested reference dictionary to record all possible keys a data dictionary could have with corresponding values which are all the keys to be used in the flat dictionary.
The data dictionary's keys will always be a subset of the keys of the reference dictionary. The flat dictionary's keys will always be a subset of the set of values of the reference dictionary.
In other words, given a reference dictionary with assignments like this:
reference['agent']['address'] = 'agentaddress'
reference['agent']['zone']['id'] = 'agentzoneid'
reference['eventid'] = 'eventid'
reference['file']['hash'] = 'filehash'
reference['file']['name'] = 'filename'
and a data dictionary with assignments like this:
nested['agent']['address'] = '172.16.16.16'
nested['eventid'] = '1234566778'
nested['file']['name'] = 'reallybadfile.exe'
code should produce a dictionary which could have been assigned like this:
flat['agentaddress'] = '172.16.16.16'
flat['eventid'] = '1234566778'
flat['filename'] = 'reallybadfile.exe'
I can never know which fields in the nested dictionary will be populated and which will not, but I can know the mappings in the reference dictionary.
I expect I will need to use recursion to traverse dictionaries into child dictionaries and potentially some sort of indirection to create the flat dictionary keys and values from the reference dictionary values and the nested dictionary keys respectively.
However, I have not yet been able to generate code that makes any sense.
Perhaps from a very high level, it may look something like this:
def this(ref, nest, flat, *args):
for (k,v) in reference:
if type(v) is dict:
this(?, ?, ?, ?)
elif nested[path][to][k]:
flat[reference[path][to][k]] = nested[path][to][k]
where [path][to][k]
represents some way to do indirection, and *args
is something I'd pass to the recursive function so that I would have a way to have context enough to reach through the nestedness of the dictionaries for the reference to the keys and values I need.
Upvotes: 1
Views: 209
Reputation: 81
@StephenRauch answer is good, if you do not want to use generators simply reformat as follows:
# r=reference, n=nested, f=final
def buildDict(r, n, f):
for key in n.keys():
if isinstance(n[key], dict):
buildDict(r.get(key), n[key], f)
else:
if r.get(key):
f[r.get(key)] = n[key]
Upvotes: 0
Reputation: 49784
Using a generator, this is fairly straight forward:
def make_flat_tuples(data, ref):
for k, v in data.items():
if isinstance(v, dict):
for x in make_flat_tuples(v, ref[k]):
yield x
else:
yield ref[k], v
flat = dict(make_flat_tuples(nested, reference))
from collections import defaultdict
reference = defaultdict(dict)
reference['agent'] = defaultdict(dict)
reference['agent']['address'] = 'agentaddress'
reference['agent']['zone']['id'] = 'agentzoneid'
reference['eventid'] = 'eventid'
reference['file']['hash'] = 'filehash'
reference['file']['name'] = 'filename'
nested = defaultdict(dict)
nested['agent']['address'] = '172.16.16.16'
nested['eventid'] = '1234566778'
nested['file']['name'] = 'reallybadfile.exe'
print(dict(make_flat_tuples(nested, reference)))
{
'agentaddress': '172.16.16.16',
'eventid': '1234566778',
'filename': 'reallybadfile.exe'
}
Upvotes: 3