Chris
Chris

Reputation: 5804

How to group an array by multiple keys?

I'd like a function that can group a list of dictionaries into sublists of dictionaries depending on an arbitrary set of keys that all dictionaries have in common.

For example, I'd like the following list to be grouped into sublists of dictionaries depending on a certain set of keys

l = [{'name':'b','type':'new','color':'blue','amount':100},{'name':'c','type':'new','color':'red','amount':100},{'name':'d','type':'old','color':'gold','amount':100},{'name':'e','type':'old','color':'red','amount':100},
{'name':'f','type':'old','color':'red','amount':100},{'name':'g','type':'normal','color':'red','amount':100}]

If I wanted to group by type, the following list would result, which has a sublists where each sublist has the same type:

[[{'name':'b','type':'new','color':'blue','amount':100},{'name':'c','type':'new','color':'red','amount':100}],[{'name':'d','type':'old','color':'gold','amount':100},{'name':'e','type':'old','color':'red','amount':100},
{'name':'f','type':'old','color':'red','amount':100}],[{'name':'g','type':'normal','color':'red','amount':100}]]

If I wanted to group by type and color, the following would result where the list contains sublists that have the same type and color:

[[{'name':'b','type':'new','color':'blue','amount':100}],[{'name':'c','type':'new','color':'red','amount':100}],[{'name':'d','type':'old','color':'gold','amount':100}],[{'name':'e','type':'old','color':'red','amount':100},
{'name':'f','type':'old','color':'red','amount':100}],[{'name':'g','type':'normal','color':'red','amount':100}]]

I understand the following function can group by one key, but I'd like to group by multiple keys:

 def group_by_key(l,i):

      l = [list(grp) for key, grp in itertools.groupby(sorted(l, key=operator.itemgetter(i)), key=operator.itemgetter(i))]

This is my attempt using the group_by_function above

 def group_by_multiple_keys(l,*keys):
      for key in keys:
          l = group_by_key(l,key)
          l = [item for sublist in l for item in sublist]
      return l 

The issue there is that it ungroups it right after it grouped it by a key. Instead, I'd like to re-group it by another key and still have one list of sublists.

Upvotes: 2

Views: 2297

Answers (1)

Cyphase
Cyphase

Reputation: 12002

itertools.groupby() + operator.itemgetter() will do what you want. groupby() takes an iterable and a key function, and groups the items in the iterable by the value returned by passing each item to the key function. itemgetter() is a factory that returns a function, which gets the specified items from any item passed to it.

from __future__ import print_function

import pprint

from itertools import groupby
from operator import itemgetter


def group_by_keys(iterable, keys):
    key_func = itemgetter(*keys)

    # For groupby() to do what we want, the iterable needs to be sorted
    # by the same key function that we're grouping by.
    sorted_iterable = sorted(iterable, key=key_func)

    return [list(group) for key, group in groupby(sorted_iterable, key_func)]


dicts = [
    {'name': 'b', 'type': 'new', 'color': 'blue', 'amount': 100},
    {'name': 'c', 'type': 'new', 'color': 'red', 'amount': 100},
    {'name': 'd', 'type': 'old', 'color': 'gold', 'amount': 100},
    {'name': 'e', 'type': 'old', 'color': 'red', 'amount': 100},
    {'name': 'f', 'type': 'old', 'color': 'red', 'amount': 100},
    {'name': 'g', 'type': 'normal', 'color': 'red', 'amount': 100}
    ]

Examples:

>>> pprint.pprint(group_by_keys(dicts, ('type',)))
[[{'amount': 100, 'color': 'blue', 'name': 'b', 'type': 'new'},
  {'amount': 100, 'color': 'red', 'name': 'c', 'type': 'new'}],
 [{'amount': 100, 'color': 'gold', 'name': 'd', 'type': 'old'},
  {'amount': 100, 'color': 'red', 'name': 'e', 'type': 'old'},
  {'amount': 100, 'color': 'red', 'name': 'f', 'type': 'old'}],
 [{'amount': 100, 'color': 'red', 'name': 'g', 'type': 'normal'}]]
>>> 
>>> pprint.pprint(group_by_keys(dicts, ('type', 'color')))
[[{'amount': 100, 'color': 'blue', 'name': 'b', 'type': 'new'}],
 [{'amount': 100, 'color': 'red', 'name': 'c', 'type': 'new'}],
 [{'amount': 100, 'color': 'gold', 'name': 'd', 'type': 'old'}],
 [{'amount': 100, 'color': 'red', 'name': 'e', 'type': 'old'},
  {'amount': 100, 'color': 'red', 'name': 'f', 'type': 'old'}],
 [{'amount': 100, 'color': 'red', 'name': 'g', 'type': 'normal'}]]

Upvotes: 3

Related Questions