Helmi
Helmi

Reputation: 539

Filter/group dictionary by nested value

Here‘s a simplified example of some data I have:

{"id": "1234565", "fields": {"name": "john", "email":"[email protected]", "country": "uk"}}

The wholeo nested dictionary is a bigger list of address data. The goal is to create pairs of people from the list with randomized partners where partners from the same country should be preferd. So my first real issue is to find a good way to group them by that country value.

I‘m sure there‘s a smarter way to do this than iterating through the dict and writing all records out to some new list/dict?

Upvotes: 0

Views: 1500

Answers (2)

Matt Eding
Matt Eding

Reputation: 1002

Here is another one that uses defaultdict:

import collections

def make_groups(nested_dicts, nested_key):
    default = collections.defaultdict(list)
    for nested_dict in nested_dicts:
        for value in nested_dict.values():
            try:
                default[value[nested_key]].append(nested_dict)
            except TypeError:
                pass
    return default

To test the results:

import random

COUNTRY = {'af', 'br', 'fr', 'mx', 'uk'}

people = [{'id': i, 'fields': {
                               'name': 'name'+str(i),
                               'email': str(i)+'@email',
                               'country': random.sample(COUNTRY, 1)[0]}} 
          for i in range(10)]

country_groups = make_groups(people, 'country')

for country, persons in country_groups.items():
    print(country, persons)

Random output:

fr [{'id': 0, 'fields': {'name': 'name0', 'email': '0@email', 'country': 'fr'}}, {'id': 1, 'fields': {'name': 'name1', 'email': '1@email', 'country': 'fr'}}, {'id': 4, 'fields': {'name': 'name4', 'email': '4@email', 'country': 'fr'}}]
br [{'id': 2, 'fields': {'name': 'name2', 'email': '2@email', 'country': 'br'}}, {'id': 8, 'fields': {'name': 'name8', 'email': '8@email', 'country': 'br'}}]
uk [{'id': 3, 'fields': {'name': 'name3', 'email': '3@email', 'country': 'uk'}}, {'id': 7, 'fields': {'name': 'name7', 'email': '7@email', 'country': 'uk'}}]
af [{'id': 5, 'fields': {'name': 'name5', 'email': '5@email', 'country': 'af'}}, {'id': 9, 'fields': {'name': 'name9', 'email': '9@email', 'country': 'af'}}]
mx [{'id': 6, 'fields': {'name': 'name6', 'email': '6@email', 'country': 'mx'}}]

Upvotes: 0

steliosbl
steliosbl

Reputation: 8921

I think this is close to what you need:

result = {key:[i for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}

What this does is use itertools.groupby to group all people in the people list by their specified country. The resulting dictionary has countries as keys, and the unpacked groupings (matching people) as values. Input is expected as a list of dictionaries like the one in your example:

people = [{"id": "1234565", "fields": {"name": "john", "email":"[email protected]", "country": "uk"}}, 
          {"id": "654321", "fields": {"name": "sam", "email":"[email protected]", "country": "uk"}}]

Sample output:

>>> print(result)
>>> {'uk': [{'fields': {'name': 'john', 'email': '[email protected]', 'country': 'uk'}, 'id': '1234565'}, {'fields': {'name': 'sam', 'email': '[email protected]', 'country': 'uk'}, 'id': '654321'}]}

For a cleaner result, the looping construct can be tweaked so that only the ID of each person is included in the result dict:

result = {key:[i["id"] for i in value] for key, value in itertools.groupby(people, lambda item: item["fields"]["country"])}
>>> print(result)
>>> {'uk': ['1234565', '654321']}

EDIT: Sorry, I forgot about the sorting. Simply sort the list of people by country before putting it through groupby. It should now work properly:

sort = sorted(people, key=lambda item: item["fields"]["country"])

Upvotes: 3

Related Questions