Reputation: 56
This question is related to Flatten nested dictionaries, compressing keys. I used that idea and modified to unnest list of dictionary but unable to do.
My input will be a dictionary which can have nested dictonaries or nested lists (list of dictionaries) as shown in the input to the function, flatten_dict_list
Below is my source code:
def flatten_dict_list(d, parent_key='', sep='.'):
items = []
for k, v in d.items():
print("***")
new_key = parent_key + sep + k if parent_key else k
print(new_key)
if isinstance(v, list):
for z in v:
if isinstance(z, collections.abc.MutableMapping):
print(type(z))
print(z)
items.extend(flatten_dict_list(z, new_key, sep=sep).items())
else:
print(type(z))
print(z)
items.append((new_key, z))
else:
if isinstance(v, collections.abc.MutableMapping):
print(type(v))
print(v)
items.extend(flatten_dict_list(v, new_key, sep=sep).items())
else:
print(type(v))
print(v)
items.append((new_key, v))
return dict(items)
print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))
Expected output is: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}
My actual result is: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c': [{'s': 2}, {'t': 100}], 'd': 3}
Upvotes: 0
Views: 1229
Reputation: 31339
Given that:
{'a': {'b': 1, 'c': 2}}
and {'a': [{'b': 1}, {'c': 2}]}
will map to the same {'a.b': 1, 'a.c': 2}
This seems like a fairly clean solution:
def _compound_key_value(xs, prefix):
if isinstance(xs, list):
for x in xs:
yield from _compound_key_value(x, prefix)
elif isinstance(xs, dict):
for k, v in xs.items():
for p, r in _compound_key_value(v, prefix):
yield prefix + (k,) + p, r
else:
yield prefix, xs
def flatten_dict_list(dl):
return {'.'.join(k): v for k, v in _compound_key_value(dl, ())}
print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))
Output
{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}
Note that it is recursive, like the solution you started out with, so I also assumed maximum recursion depth would not become an issue.
A few follow-up questions from the comments:
Difference between yield
and yield from
?yield <value>
directly makes a generator yield that value. As you've probably read by now, when a value is requested from it, a generator runs until it has a value to yield, which it then yields and pauses, and continues running when the next value is needed, etc. yield from <some other generator>
makes the generator request a value from another generator and it then immediately yields that value and it keeps yielding from it (one at a time) until there is nothing left, and only then continues with the rest of the code, to the next yield
.
In the solution, _compound_key_value(x, prefix)
starts a new generator recursively, which will start yielding values, which are yielded one by one using yield from
. If the code had been yield _compound_key_value(x, prefix)
(without the from
), it would have yielded the generator itself, instead of values from it - that can be useful as well, but not here.
The same could be achieved with for a in _compound_key_value(x, prefix): yield a
, except that would be slower, because yield from
has the new generator yield directly to the caller of this generator, without intermediate steps; and it is easier to read.
TL;DR: yield x
yields x
, yield from x
only works if x
is a generator itself and yields everything from x
one at a time.
Why are you passing an empty tuple as prefix?
To avoid having to check if prefix has some value at all, it needs an initial value and I chose to pass the empty tuple instead of setting the default to the empty tuple in the signature like this:def _compound_key_value(xs, prefix=()):
Either works, but I felt no default looked cleaner and since _compound_key_value
is an internal function, not intended for direct use outside functions like flatten_dict_list
, the requirement to pass an empty tuple when calling it seemed reasonable.
TL;DR: as a default, prefix=()
would have also worked.
What is this doing? (k, )
It is part of one statement: yield prefix + (k,) + p, r
this yields a tuple of two values, the first being prefix + (k,) + p
. prefix
is the function parameter which expects a tuple, and p
is also a tuple, since it's the first half of the tuple returned by the recursive call. If you add three tuples together, the result is a new tuple with all the parts combined in order, so (k,)
takes a key as obtained from xs.items()
and puts it in a tuple by itself, so it can be added together with the other tuples, and be yielded as a tuple, the first half of a tuple with r
as the second half.
TL;DR: (k,)
makes a new tuple, with a single element k
.
Upvotes: 1