Reputation: 77
I'm struggling to wrap my head around this one. I've got a list with multiple dictionaries that I would like to aggregate based on two values. Example code:
>>> data = [
... { "regex": ".*ccc-r.*", "age": 44, "count": 224 },
... { "regex": ".*nft-r.*", "age": 23, "count": 44 },
... { "regex": ".*ccc-r.*", "age": 44, "count": 20 },
... { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
... { "regex": ".*nft-r.*", "age": 23, "count": 46 },
... { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
... ]
I'm trying to aggregate dicts that have the same age and regex and adding the count key across all instances. Example output would be:
>>> data = [
... { "regex": ".*ccc-r.*", "age": 44, "count": 244 },
... { "regex": ".*nft-r.*", "age": 23, "count": 90 },
... { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
... { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
... ]
Would like to do this without pandas or addon modules, would prefer a solution from the std lib if at all possible.
Thanks!
Upvotes: 1
Views: 187
Reputation: 11721
If you're not opposed to using a library (and a slightly different output) this can be done nicely with pandas
import pandas as pd
df = pd.DataFrame(data)
data.groupby(['regex', 'age']).sum()
This yields
count
regex age
.*ccc-r.* 32 16
44 244
.*nft-r.* 23 90
.*zxy-r.* 16 55
Upvotes: 0
Reputation: 8302
You can also try,
agg = {}
for d in data:
if agg.get(d['regex']):
agg[d['regex']]['count'] += d['count']
else:
agg[d['regex']] = d
print(agg.values())
Upvotes: 1
Reputation: 69903
Assuming you do not want to use any imports, you can first collect the data in a dictionary aggregated_data
in which the key will be a tuple of (regex, age)
, and the value will be the count
. Once you have formed this dictionary, you can form back the original structure you had:
data = [
{ "regex": ".*ccc-r.*", "age": 44, "count": 224 },
{ "regex": ".*nft-r.*", "age": 23, "count": 44 },
{ "regex": ".*ccc-r.*", "age": 44, "count": 20 },
{ "regex": ".*ccc-r.*", "age": 32, "count": 16 },
{ "regex": ".*nft-r.*", "age": 23, "count": 46 },
{ "regex": ".*zxy-r.*", "age": 16, "count": 55 }
]
aggregated_data = {}
for dictionary in data:
key = (dictionary['regex'], dictionary['age'])
aggregated_data[key] = aggregated_data.get(key, 0) + dictionary['count']
data = [{'regex': key[0], 'age': key[1], 'count': value} for key, value in aggregated_data.items()]
Upvotes: 1
Reputation: 71461
You can use collections.defaultdict
:
from collections import defaultdict
d = defaultdict(int)
data = [{'regex': '.*ccc-r.*', 'age': 44, 'count': 224}, {'regex': '.*nft-r.*', 'age': 23, 'count': 44}, {'regex': '.*ccc-r.*', 'age': 44, 'count': 20}, {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, {'regex': '.*nft-r.*', 'age': 23, 'count': 46}, {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]
for i in data:
d[(i['regex'], i['age'])] += i['count']
r = [{'regex':a, 'age':b, 'count':c} for (a, b), c in d.items()]
Output:
[{'regex': '.*ccc-r.*', 'age': 44, 'count': 244},
{'regex': '.*nft-r.*', 'age': 23, 'count': 90},
{'regex': '.*ccc-r.*', 'age': 32, 'count': 16},
{'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]
Upvotes: 2