LittleBobbyTables
LittleBobbyTables

Reputation: 4473

Flattening a list of dicts of lists of dicts (etc) of unknown depth in Python (nightmarish JSON structure)

I'm dealing with a JSON structure which is output to me in structures like this:

[{u'item': u'something',
  u'data': {
            u'other': u'',
            u'else':
               [
                  {
                    u'more': u'even more',
                    u'argh':
                         {
                            ...etc..etc

As you can see, these are nested dicts and lists. There is much discussion about flattening these recursively, but I haven't found one yet that can deal with a list of dictionaries which may in turn contain either dictionaries of lists, lists of lists, dictionaries of dictionaries etc; which are of unknown depth! In some cases the depth may be up to 100 or so. I've been trying this so far without much luck (python 2.7.2):

def flatten(structure):
    out = []
    for item in structure:
        if isinstance(item, (list, tuple)):
            out.extend(flatten(item))
        if isinstance(item, (dict)):
            for dictkey in item.keys():
                out.extend(flatten(item[dictkey]))
        else:
            out.append(item)
    return out

Any ideas?

UPDATE This pretty much works:

def flatten(l):
    out = []
    if isinstance(l, (list, tuple)):
        for item in l:
            out.extend(flatten(item))
    elif isinstance(l, (dict)):
        for dictkey in l.keys():
            out.extend(flatten(l[dictkey]))
    elif isinstance(l, (str, int, unicode)):
        out.append(l)
    return out

Upvotes: 6

Views: 5322

Answers (1)

jsbueno
jsbueno

Reputation: 110486

Since the depth of your data is arbitrary, it is easier to resort to recursion to flatten it. This function creates a flat dictionary, with the path to each data item composed as the key, in order to avoid collisions.

You can retrieve its contents later with for key in sorted(dic_.keys()), for example.

I didn't test it, since you did not provide a "valid" snippet of your data.

def flatten(structure, key="", path="", flattened=None):
    if flattened is None:
        flattened = {}
    if type(structure) not in(dict, list):
        flattened[((path + "_") if path else "") + key] = structure
    elif isinstance(structure, list):
        for i, item in enumerate(structure):
            flatten(item, "%d" % i, path + "_" + key, flattened)
    else:
        for new_key, value in structure.items():
            flatten(value, new_key, path + "_" + key, flattened)
    return flattened

Upvotes: 11

Related Questions