turnip
turnip

Reputation: 2346

Cleaner way to unpack nested dictionaries

I am receiving data in batches from an API in JSON format. I wish to store only the values, in a list.

The raw data looks like this and will always look like this, i.e: all {...} will look like the first example:

data = content.get('data')
>>> [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {...}, {...}, ...]

The nested dictionary is making this harder; I need this unpacked as well.

Here is what I have, which works but it feels so bad:

unpacked = []
data = content.get('data')
for d in data:
    item = []
    for k, v in d.items():
        if k == 'b':
            for val in v.values():
                item.append(val)
        else:
            item.append(v)
    unpacked.append(item)

Output:

>>> [[1,2,3,4], [...], [...], ...]

How can I improve this?

Upvotes: 3

Views: 14966

Answers (5)

Adam Erickson
Adam Erickson

Reputation: 6363

For completeness, based on the excellent answer of Eric Duminil, here is a function that returns the maximum depth of a nested dict or list:

def depth(it, count=0):
    """Depth of a nested dict.
    # Arguments
        it: a nested dict or list.
        count: a constant value used in internal calculations.
    # Returns
        Numeric value.
    """
    if isinstance(it, list):
        if any(isinstance(v, list) or isinstance(v, dict) for v in it):
            for v in it:
                if isinstance(v, list) or isinstance(v, dict):
                    return depth(v, count + 1)
        else:
            return count
    elif isinstance(it, dict):
        if any(isinstance(v, list) or isinstance(v, dict) for v in it.values()):
            for v in it.values():
                if isinstance(v, list) or isinstance(v, dict):
                    return depth(v, count + 1)
        else:
            return count
    else:
        return count

In the Python tradition, it is zero-based.

Upvotes: 1

Kaushik NP
Kaushik NP

Reputation: 6781

Doing recursively :

def traverse(d): 
    for key,val in d.items(): 
        if isinstance(val, dict): 
             traverse(val) 
        else: 
             l.append(val) 

out=[]
for d in data:
    l=[]
    traverse(d)
    out.append(l)

print(out)

#driver values :

IN : data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]
OUT : out = [[1, 2, 3, 4], [5, 6]]

EDIT : A better way to do this is using yield so as not to have to rely on global variables as in the first method.

def traverse(d): 
    for key,val in d.items(): 
        if isinstance(val, dict): 
             yield from traverse(val) 
        else: 
             yield val

out = [list(traverse(d)) for d in data]

Upvotes: 0

Guillaume
Guillaume

Reputation: 6009

Other answers (especially @COLDSPEED's) have already covered the situation, but here is a slightly different code based on the old adage it's better to ask forgiveness than permission , which I tend to prefer to type checking:

def unpack(data):
    try:
        for value in data.values():
            yield from unpack(value)
    except AttributeError:
        yield data


data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}]
unpacked = [list(unpack(item)) for item in data]

Upvotes: 0

cs95
cs95

Reputation: 402573

Assuming your dictionaries do not contain inner lists, you could define a simple routine to unpack a nested dictionary, and iterate through each item in data using a loop.

def unpack(data):
    for k, v in data.items():
        if isinstance(v, dict):
            yield from unpack(v)
        else:
            yield v

Note that this function is as simple as it is thanks to the magic of yield from. Now, let's call it with some data.

data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]  # Data "borrowed" from Kaushik NP
result = [list(unpack(x)) for x in data]

print(result)
[[2, 3, 1, 4], [5, 6]]

Note the lack of order in your result, because of the arbitrary order of dictionaries.

Upvotes: 2

Eric Duminil
Eric Duminil

Reputation: 54233

You could use a recursive function and some type tests:

data = [{'a':1, 'b':{'c':2, 'd':3}, 'e':4}, {'f':5,'g':6}]

def extract_nested_values(it):
    if isinstance(it, list):
        for sub_it in it:
            yield from extract_nested_values(sub_it)
    elif isinstance(it, dict):
        for value in it.values():
            yield from extract_nested_values(value)
    else:
        yield it

print(list(extract_nested_values(data)))
# [1, 2, 3, 4, 5, 6]

Note that it outputs a flat generator, not a list of lists.

Upvotes: 6

Related Questions