zegkljan
zegkljan

Reputation: 8391

Python: group a list into sublists by a equality of projected value

Is there a nice pythonic way of grouping a list into a list of lists where each of the inner lists contain only those elements that have the same projection, defined by the user as a function?

Example:

>>> x = [0, 1, 2, 3, 4, 5, 6, 7]
>>> groupby(x, projection=lambda e: e % 3)
[[0, 3, 6], [1, 4, 7], [2, 5]]

I don't care about the projection itself, just that if it is equal for some elements these must end up in the same sublist.

I'm basically looking for a python equivalent of the haskell function GHC.Exts.groupWith:

Prelude> import GHC.Exts
Prelude GHC.Exts> groupWith (`mod` 3) [0..7]
[[0,3,6],[1,4,7],[2,5]]

Upvotes: 8

Views: 23924

Answers (4)

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

Here is one approach using compress from itertools:

from itertools import compress
import numpy as np

L = [i %3 for i in x]

[list(compress(x, np.array(L)==i)) for i in set(L)]
#[[0, 3, 6], [1, 4, 7], [2, 5]]

Upvotes: 1

fmarc
fmarc

Reputation: 1726

The itertools module in the standard-library contains a groupby() function that should do what you want.

Note that the input to groupby() should be sorted by the group key to yield each group only once, but it's easy to use the same key function for sorting. So if your key function (projection) is looking at whether a number is even, it would look like this:

from itertools import groupby
x = [0, 1, 2, 3, 4, 5, 6, 7]

def projection(val):
    return val % 3

x_sorted = sorted(x, key=projection)
x_grouped = [list(it) for k, it in groupby(x_sorted, projection)]    
print(x_grouped)

[[0, 3, 6], [1, 4, 7], [2, 5]]

Note that while this version only uses standard Python features, if you are dealing with more than maybe 100.000 values, you should look into pandas (see @ayhan's answer)

Upvotes: 15

user2285236
user2285236

Reputation:

A pandas version would be like this:

import pandas as pd
x = [0, 1, 2, 3, 4, 5, 6, 7]
pd.Series(x).groupby(lambda t: t%3).groups
Out[13]: {0: [0, 3, 6], 1: [1, 4, 7], 2: [2, 5]}

Or

pd.Series(x).groupby(lambda t: t%3).groups.values()
Out[32]: dict_values([[0, 3, 6], [1, 4, 7], [2, 5]])

Upvotes: 3

Alex Hall
Alex Hall

Reputation: 36013

No need to sort.

from collections import defaultdict

def groupby(iterable, projection):
    result = defaultdict(list)
    for item in iterable:
        result[projection(item)].append(item)
    return result

x = [0, 1, 2, 3, 4, 5, 6, 7]
groups = groupby(x, projection=lambda e: e % 3)
print groups
print groups[0]

Output:

defaultdict(<type 'list'>, {0: [0, 3, 6], 1: [1, 4, 7], 2: [2, 5]})
[0, 3, 6]

Upvotes: 7

Related Questions