halfcup
halfcup

Reputation: 31

Using map and lambda to handle nested dicts

I currently have a dict like this (assume many countries, states, and cities):

'USA': {
    'Texas': {
        'Austin': {
            '2017-01-01': 169,
            '2017-02-01': 231
        },
        'Houston': {
            '2017-01-01': 265,
            '2017-02-01': 310
        }
    }

I want to create a new dict "grouping by" only country and date, filtering for a given state, so the result would be:

'USA': {
            '2017-01-01': 434,
            '2017-02-01': 541

    }

I can do this by looping over each layer of the dict, but it's hard to read. Is there a way to do this with lambda/map functions instead?

Also, we are unable to use pandas dataframes for other reasons, so I can't use that groupby feature.

Upvotes: 1

Views: 1732

Answers (3)

pault
pault

Reputation: 43504

Here's a way to possibly simplify your existing code using collections.Counter. Suppose your source dictionary is named d:

from collections import Counter
my_state='Texas'
mapped = {
    country: [Counter(d[country][my_state][city]) for city in d[country][my_state]]
    for country in d
}
print(mapped)
#{'USA': [Counter({'2017-01-01': 265, '2017-02-01': 310}),
#  Counter({'2017-01-01': 169, '2017-02-01': 231})]}

This mapped your original dictionary in to one of the form {country: list_of_counters}.

Now you can use operator.add() to reduce this list:

from operator import add
for country in mapped:
    print("{country}: {sums}".format(country=country, sums=reduce(add, mapped[country])))
#USA: Counter({'2017-02-01': 541, '2017-01-01': 434})

Or as map/reduce:

map(lambda country: {country: reduce(add, mapped[country])}, mapped)
[{'USA': Counter({'2017-01-01': 434, '2017-02-01': 541})}]

If you prefer to have dicts instead of Counters:

map(lambda country: {country: dict(reduce(add, mapped[country]))}, mapped)
#[{'USA': {'2017-01-01': 434, '2017-02-01': 541}}]

Upvotes: 0

jpp
jpp

Reputation: 164693

If you only want to extract the lowest level values of your nested dictionary, this can be achieved using a generator.

The below generator is a slightly modified version of one written by @Richard.

You can then combine this with collections.defaultdict to obtain your desired result.

from collections import defaultdict

def NestedDictValues(d):
    for k, v in d.items():
        if isinstance(v, dict):
            yield from NestedDictValues(v)
        else:
            yield (k, v)

def sumvals(lst):
    c = defaultdict(int)
    for i, j in lst:
        c[i] += j
    return dict(c)

d = {'USA': sumvals(NestedDictValues(s))}

# {'USA': {'2017-01-01': 434, '2017-02-01': 541}}

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71451

I believe that in this case, it is much cleaner to use recursion than a map or reduce function:

import re
import itertools
s = {'USA': {
'Texas': {
    'Austin': {
        '2017-01-01': 169,
        '2017-02-01': 231
    },
    'Houston': {
        '2017-01-01': 265,
        '2017-02-01': 310
    }
   }
 }
}
def get_dates(d):
  val = [(a, b) if isinstance(b, int) and re.findall('\d+-\d+-\d+', a) else get_dates(b) for a, b in d.items()]
  return [i for b in val for i in b] if not all(isinstance(i, tuple) for i in val) else val

last_data = {a:{c:sum(g for _, g in h) for c, h in itertools.groupby(sorted(get_dates(b), key=lambda x:x[0]), key=lambda x:x[0])} for a, b in s.items()}

Output:

{'USA': {'2017-02-01': 541, '2017-01-01': 434}}

Upvotes: 0

Related Questions