Reputation: 31
I currently have a dict like this (assume many countries, states, and cities):
'USA': {
'Texas': {
'Austin': {
'2017-01-01': 169,
'2017-02-01': 231
},
'Houston': {
'2017-01-01': 265,
'2017-02-01': 310
}
}
I want to create a new dict "grouping by" only country and date, filtering for a given state, so the result would be:
'USA': {
'2017-01-01': 434,
'2017-02-01': 541
}
I can do this by looping over each layer of the dict, but it's hard to read. Is there a way to do this with lambda/map functions instead?
Also, we are unable to use pandas dataframes for other reasons, so I can't use that groupby feature.
Upvotes: 1
Views: 1732
Reputation: 43504
Here's a way to possibly simplify your existing code using collections.Counter
. Suppose your source dictionary is named d
:
from collections import Counter
my_state='Texas'
mapped = {
country: [Counter(d[country][my_state][city]) for city in d[country][my_state]]
for country in d
}
print(mapped)
#{'USA': [Counter({'2017-01-01': 265, '2017-02-01': 310}),
# Counter({'2017-01-01': 169, '2017-02-01': 231})]}
This mapped your original dictionary in to one of the form {country: list_of_counters}
.
Now you can use operator.add()
to reduce this list:
from operator import add
for country in mapped:
print("{country}: {sums}".format(country=country, sums=reduce(add, mapped[country])))
#USA: Counter({'2017-02-01': 541, '2017-01-01': 434})
Or as map/reduce
:
map(lambda country: {country: reduce(add, mapped[country])}, mapped)
[{'USA': Counter({'2017-01-01': 434, '2017-02-01': 541})}]
If you prefer to have dict
s instead of Counter
s:
map(lambda country: {country: dict(reduce(add, mapped[country]))}, mapped)
#[{'USA': {'2017-01-01': 434, '2017-02-01': 541}}]
Upvotes: 0
Reputation: 164693
If you only want to extract the lowest level values of your nested dictionary, this can be achieved using a generator.
The below generator is a slightly modified version of one written by @Richard.
You can then combine this with collections.defaultdict
to obtain your desired result.
from collections import defaultdict
def NestedDictValues(d):
for k, v in d.items():
if isinstance(v, dict):
yield from NestedDictValues(v)
else:
yield (k, v)
def sumvals(lst):
c = defaultdict(int)
for i, j in lst:
c[i] += j
return dict(c)
d = {'USA': sumvals(NestedDictValues(s))}
# {'USA': {'2017-01-01': 434, '2017-02-01': 541}}
Upvotes: 1
Reputation: 71451
I believe that in this case, it is much cleaner to use recursion than a map
or reduce
function:
import re
import itertools
s = {'USA': {
'Texas': {
'Austin': {
'2017-01-01': 169,
'2017-02-01': 231
},
'Houston': {
'2017-01-01': 265,
'2017-02-01': 310
}
}
}
}
def get_dates(d):
val = [(a, b) if isinstance(b, int) and re.findall('\d+-\d+-\d+', a) else get_dates(b) for a, b in d.items()]
return [i for b in val for i in b] if not all(isinstance(i, tuple) for i in val) else val
last_data = {a:{c:sum(g for _, g in h) for c, h in itertools.groupby(sorted(get_dates(b), key=lambda x:x[0]), key=lambda x:x[0])} for a, b in s.items()}
Output:
{'USA': {'2017-02-01': 541, '2017-01-01': 434}}
Upvotes: 0