Aady
Aady

Reputation: 56

Unnest dictionary with nested Dictionary or list of dictionaries

This question is related to Flatten nested dictionaries, compressing keys. I used that idea and modified to unnest list of dictionary but unable to do.

My input will be a dictionary which can have nested dictonaries or nested lists (list of dictionaries) as shown in the input to the function, flatten_dict_list

Below is my source code:

def flatten_dict_list(d, parent_key='', sep='.'):
    items = []
    for k, v in d.items():
        print("***")
        new_key = parent_key + sep + k if parent_key else k
        print(new_key)

        if isinstance(v, list):
          for z in v:
              if isinstance(z, collections.abc.MutableMapping):
                  print(type(z))
                  print(z)
                  items.extend(flatten_dict_list(z, new_key, sep=sep).items())
              else:
                  print(type(z))
                  print(z)
                  items.append((new_key, z))
        else:
          if isinstance(v, collections.abc.MutableMapping):
             print(type(v))
             print(v)
             items.extend(flatten_dict_list(v, new_key, sep=sep).items())
          else:
             print(type(v))
             print(v)
             items.append((new_key, v))
    return dict(items)

print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))

Expected output is: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}

My actual result is: {'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c': [{'s': 2}, {'t': 100}], 'd': 3}

Upvotes: 0

Views: 1229

Answers (1)

Grismar
Grismar

Reputation: 31339

Given that:

  • you don't care that both {'a': {'b': 1, 'c': 2}} and {'a': [{'b': 1}, {'c': 2}]} will map to the same {'a.b': 1, 'a.c': 2}
  • there won't be anything other than dictionaries in the lists in the input data structure

This seems like a fairly clean solution:

def _compound_key_value(xs, prefix):
    if isinstance(xs, list):
        for x in xs:
            yield from _compound_key_value(x, prefix)
    elif isinstance(xs, dict):
        for k, v in xs.items():
            for p, r in _compound_key_value(v, prefix):
                yield prefix + (k,) + p, r
    else:
        yield prefix, xs


def flatten_dict_list(dl):
    return {'.'.join(k): v for k, v in _compound_key_value(dl, ())}


print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))

Output

{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}

Note that it is recursive, like the solution you started out with, so I also assumed maximum recursion depth would not become an issue.

A few follow-up questions from the comments:

  1. Difference between yield and yield from?

    yield <value> directly makes a generator yield that value. As you've probably read by now, when a value is requested from it, a generator runs until it has a value to yield, which it then yields and pauses, and continues running when the next value is needed, etc. yield from <some other generator> makes the generator request a value from another generator and it then immediately yields that value and it keeps yielding from it (one at a time) until there is nothing left, and only then continues with the rest of the code, to the next yield.

    In the solution, _compound_key_value(x, prefix) starts a new generator recursively, which will start yielding values, which are yielded one by one using yield from. If the code had been yield _compound_key_value(x, prefix) (without the from), it would have yielded the generator itself, instead of values from it - that can be useful as well, but not here.

    The same could be achieved with for a in _compound_key_value(x, prefix): yield a, except that would be slower, because yield from has the new generator yield directly to the caller of this generator, without intermediate steps; and it is easier to read.

    TL;DR: yield x yields x, yield from x only works if x is a generator itself and yields everything from x one at a time.

  2. Why are you passing an empty tuple as prefix?

    To avoid having to check if prefix has some value at all, it needs an initial value and I chose to pass the empty tuple instead of setting the default to the empty tuple in the signature like this:
    def _compound_key_value(xs, prefix=()):

    Either works, but I felt no default looked cleaner and since _compound_key_value is an internal function, not intended for direct use outside functions like flatten_dict_list, the requirement to pass an empty tuple when calling it seemed reasonable.

    TL;DR: as a default, prefix=() would have also worked.

  3. What is this doing? (k, )

    It is part of one statement: yield prefix + (k,) + p, r this yields a tuple of two values, the first being prefix + (k,) + p. prefix is the function parameter which expects a tuple, and p is also a tuple, since it's the first half of the tuple returned by the recursive call. If you add three tuples together, the result is a new tuple with all the parts combined in order, so (k,) takes a key as obtained from xs.items() and puts it in a tuple by itself, so it can be added together with the other tuples, and be yielded as a tuple, the first half of a tuple with r as the second half.

    TL;DR: (k,) makes a new tuple, with a single element k.

Upvotes: 1

Related Questions