Reputation: 818
I have list of dictionaries as below
dataset={"users": [
{"id": 20, "loc": "Chicago", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Manufacturing"}, {"sname": null}]},
{"id": 21, "loc": "Frankfurt", "st":"4", "sectors": [{"sname": null}]},
{"id": 22, "loc": "Berlin", "st":"6", "sectors": [{"sname": "Manufacturing"}, {"sname": "Banking"},{"sname": "Agri"}]},
{"id": 23, "loc": "Chicago", "st":"2", "sectors": [{"sname": "Banking"}, {"sname": "Agri"}]},
{"id": 24, "loc": "Bern", "st":"1", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}]},
{"id": 25, "loc": "Bern", "st":"4", "sectors": [{"sname": "Retail"}, {"sname": "Agri"}, {"sname": "Banking"}]}
]}
I tried below code to remove loc, sectors from above lists so that my list would contain only id and loc
fs_loc = []
for g, items in itertools.groupby(data['users'], lambda x: (x['id'],x['loc'])):
fs_loc.append({ 'id': g[0], 'loc': g[1] })
print(fs_loc)
From this, how can I create new list such that it will have list of id's and the count of them that were grouped by locations like below.
{"locations": [
{"loc": "Chicago","count":2,"ids": [{"id": "20"}, {"id": "23"}]},
{"loc": "Bern","count":2,"ids": [{"id": "24"}, {"id": "25"}]},
{"loc": "Frankfurt","count":1,"ids": [{"id": "21"}]},
{"loc": "Berlin","count":1,"ids": [{"id": "21"}]}
]}
I found this difficulty in making the list as above using itertools, probably I might be missing some better approach on achieving as above, could you please suggest.
Upvotes: 3
Views: 202
Reputation: 368944
You need to pass a sorted sequence to itertools.groupby
.
According to itertools.groupby
documentation:
... Generally, the iterable needs to already be sorted on the same key function.
The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.
byloc = lambda x: x['loc']
it = (
(loc, list(user_grp))
for loc, user_grp in itertools.groupby(
sorted(dataset['users'], key=byloc), key=byloc
)
)
fs_loc = [
{'loc': loc, 'ids': [x['id'] for x in grp], 'count': len(grp)}
for loc, grp in it
]
fs_loc
→
[
{'count': 1, 'loc': 'Berlin', 'ids': [22]},
{'count': 2, 'loc': 'Bern', 'ids': [24, 25]},
{'count': 2, 'loc': 'Chicago', 'ids': [20, 23]},
{'count': 1, 'loc': 'Frankfurt', 'ids': [21]}
]
Upvotes: 4