J.Ester
J.Ester

Reputation: 33

Group by multiple keys and summarize/average multiple values of a list of dictionaries

I am new to Python and have ran into an issue with the below code.

I was looking for a way to group by multiple keys and summarize/average values of a list of dictionaries in Python. The below code (also located from previous question/response located here: Group by multiple keys and summarize/average values of a list of dictionaries) set me off on the right track but I am running into issues adding more field aggregation in the loop.

Say I have a list of dictionaries as seen below:

input = [
{'msn': '001', 'source': 'foo', 'status': '1', 'qty': 100, 'vol': 100},
{'msn': '001', 'source': 'bar', 'status': '2', 'qty': 200, 'vol': 200},
{'msn': '001', 'source': 'foo', 'status': '1', 'qty': 300, 'vol': 300},
{'msn': '002', 'source': 'baz', 'status': '2', 'qty': 400, 'vol': 100},
{'msn': '002', 'source': 'baz', 'status': '1', 'qty': 500, 'vol': 400},
{'msn': '002', 'source': 'qux', 'status': '1', 'qty': 600, 'vol': 100},
{'msn': '003', 'source': 'foo', 'status': '2', 'qty': 700, 'vol': 200}]

My code so far:

for key, grp in groupby(sorted(dict_list, key = grouper), grouper):
    temp_dict = dict(zip(["msn", "source"], key))
    temp_dict["qty"] = sum(item["qty"] for item in grp)
    temp_dict["vol"] = sum(item["vol"] for item in grp)
    result.append(temp_dict)

Expected result was:

{'msn': '001', 'source': 'foo', 'qty': 400, 'vol': 400},
{'msn': '001', 'source': 'bar', 'qty': 200, 'vol': 200},
{'msn': '002', 'source': 'baz', 'qty': 200, 'vol': 500},
{'msn': '003', 'source': 'foo', 'qty': 900, 'vol': 200}]

Placement of temp_dict["vol"] = sum(item["vol"] for item in grp) within the for loop does not produce the desired results which is ultimately my issue.

How do I go about keeping the key, grouping as seen in the code while adding(appending) another field and its calculated value to the list?

Thanks in advance for any help.

Upvotes: 3

Views: 996

Answers (1)

Paul Panzer
Paul Panzer

Reputation: 53029

You need to "copy" grp if you want to iterate through it multiple times, itertools.tee can do that for you

for key, grp in groupby(sorted(dict_list, key = grouper), grouper):
    temp_dict = dict(zip(["msn", "source"], key))
    grp1, grp2 = tee(grp)
    temp_dict["qty"] = sum(item["qty"] for item in grp1)
    temp_dict["vol"] = sum(item["vol"] for item in grp2)
    result.append(temp_dict)

Upvotes: 1

Related Questions