Jakob Bowyer
Jakob Bowyer

Reputation: 34698

Intelligently merging dicts

I am trying to merge some dicts on some specific requirements, here is some example data

data = [{"nid": 363, "cid": "509cd9aaad4d5", "count": 57, "value": 12.5},
        {"nid": 363, "cid": "509cd9aaad4d5", "count": 57, "value": 22},
        {"nid": 363, "cid": "cd9aaad4d5", "count": 57, "value": 49},
        {"nid": 570, "cid": "cd9aaad4d5", "count": 58, "value": 62},
    ]

I need to merge all the dict's that share the same nid and cid and sum the value, but leave the count as it is.

So the above example would be returned as (or similar, I did it by hand it might have a mistake)

[
    {'count': 58, 'value': 62, 'nid': 570, 'cid': 'cd9aaad4d5'},
    {'count': 57, 'value': 34.5, 'nid': 363, 'cid': '509cd9aaad4d5'},
    {'count': 57, 'value': 49, 'nid': 363, 'cid': 'cd9aaad4d5'}
]

My code attempt so far is ugly, and I could really do with some guidance,

tmp = defaultdict(lambda: defaultdict(lambda: [0, 0]))
for d in data:
    tmp[d["nid"]][d["cid"]][1] = d["count"]
    tmp[d["nid"]][d["cid"]][0] += d["value"]

print tmp

new_data = []

for key in tmp:
    for cid in tmp[key]:
        new_data.append({"nid": key, "cid": cid, "count": tmp[key][cid][1], "value": tmp[key][cid][0]})

print new_data

Can anyone help me identify a far cleaner, and more intelligent way of merging the list of dicts.

Upvotes: 2

Views: 113

Answers (2)

Matt
Matt

Reputation: 17629

Use pandas:

 import pandas as pd
 df = pd.DataFrame(data)
 s1 = df.groupby(['nid', 'cid']).sum().value   # sums of all values
 # assuming counts are the same for each nid/cid tuple
 s2 = df.groupby(['nid', 'cid']).count.first() # first element of counts
 pd.DataFrame({'value' : s1, 'count' : s2})

Output:

nid|cid              | count | value
---+-----------------+-------+------
363|509cd9aaad4d5    | 57    | 34.5
   |cd9aaad4d5       | 57    | 49.0
570|cd9aaad4d5       | 58    | 62.0

If you don't like the hierarchical index, you can flatten the dataframe:

pd.DataFrame({'count' : df2, 'value' :df1}).reset_index()

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1122282

You can improve a little on your attempt by using a compound key:

from collections import defaultdict 

tmp = defaultdict(lambda: {'value': 0})
for d in data:
    tmp[d["nid"], d["cid"]]['count'] = d["count"]
    tmp[d["nid"], d["cid"]]['value'] += d["value"]

new_data = [{'nid': nid, 'cid': cid, 'count': v['count'], 'value': v['value']} 
            for (nid, cid), v in tmp.iteritems()]

The alternative would be to sort data and use itertools.groupby(), but because of the sort that is more costly.

Upvotes: 1

Related Questions