Reputation: 34698
I am trying to merge some dicts on some specific requirements, here is some example data
data = [{"nid": 363, "cid": "509cd9aaad4d5", "count": 57, "value": 12.5},
{"nid": 363, "cid": "509cd9aaad4d5", "count": 57, "value": 22},
{"nid": 363, "cid": "cd9aaad4d5", "count": 57, "value": 49},
{"nid": 570, "cid": "cd9aaad4d5", "count": 58, "value": 62},
]
I need to merge all the dict's that share the same nid
and cid
and sum the value
, but leave the count
as it is.
So the above example would be returned as (or similar, I did it by hand it might have a mistake)
[
{'count': 58, 'value': 62, 'nid': 570, 'cid': 'cd9aaad4d5'},
{'count': 57, 'value': 34.5, 'nid': 363, 'cid': '509cd9aaad4d5'},
{'count': 57, 'value': 49, 'nid': 363, 'cid': 'cd9aaad4d5'}
]
My code attempt so far is ugly, and I could really do with some guidance,
tmp = defaultdict(lambda: defaultdict(lambda: [0, 0]))
for d in data:
tmp[d["nid"]][d["cid"]][1] = d["count"]
tmp[d["nid"]][d["cid"]][0] += d["value"]
print tmp
new_data = []
for key in tmp:
for cid in tmp[key]:
new_data.append({"nid": key, "cid": cid, "count": tmp[key][cid][1], "value": tmp[key][cid][0]})
print new_data
Can anyone help me identify a far cleaner, and more intelligent way of merging the list of dicts.
Upvotes: 2
Views: 113
Reputation: 17629
Use pandas
:
import pandas as pd
df = pd.DataFrame(data)
s1 = df.groupby(['nid', 'cid']).sum().value # sums of all values
# assuming counts are the same for each nid/cid tuple
s2 = df.groupby(['nid', 'cid']).count.first() # first element of counts
pd.DataFrame({'value' : s1, 'count' : s2})
Output:
nid|cid | count | value
---+-----------------+-------+------
363|509cd9aaad4d5 | 57 | 34.5
|cd9aaad4d5 | 57 | 49.0
570|cd9aaad4d5 | 58 | 62.0
If you don't like the hierarchical index, you can flatten the dataframe:
pd.DataFrame({'count' : df2, 'value' :df1}).reset_index()
Upvotes: 1
Reputation: 1122282
You can improve a little on your attempt by using a compound key:
from collections import defaultdict
tmp = defaultdict(lambda: {'value': 0})
for d in data:
tmp[d["nid"], d["cid"]]['count'] = d["count"]
tmp[d["nid"], d["cid"]]['value'] += d["value"]
new_data = [{'nid': nid, 'cid': cid, 'count': v['count'], 'value': v['value']}
for (nid, cid), v in tmp.iteritems()]
The alternative would be to sort data
and use itertools.groupby()
, but because of the sort that is more costly.
Upvotes: 1