thor
thor

Reputation: 22520

How to get the groups generated by "groupby()" as lists?

I am testing itertools.groupby() and try to get the groups as lists but can't figure out how to make it work.

using the examples here, in How do I use Python's itertools.groupby()?

from itertools import groupby

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"),
         ("vehicle", "speed boat"), ("vehicle", "school bus")]

I tried (python 3.5):

g = groupby(things, lambda x: x[0])
ll = list(g)
list(tuple(ll[0])[1])

I thought I should get the first group ("animal") as a list ['bear', 'duck']. But I just get an empty list on REPL.

What am I doing wrong?

How should I extract all three groups as lists?

Upvotes: 3

Views: 1215

Answers (3)

ShadowRanger
ShadowRanger

Reputation: 155507

If you just want the groups, without the keys, you need to realize the group generators as you go, per the docs:

Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.

This means that when you try to list-ify the groupby generator first using ll = list(g), before converting the individual group generators, all but the last group generator will be invalid/empty.

(Note that list is just one option; a tuple or any other container works too).

So to do it properly, you'd make sure to listify each group generator before moving on to the next:

from operator import itemgetter  # Nicer than ad-hoc lambdas

# Make the key, group generator
gen = groupby(things, key=itemgetter(0))

# Strip the keys; you only care about the group generators
# In Python 2, you'd use future_builtins.map, because a non-generator map would break
groups = map(itemgetter(1), gen)

# Convert them to list one by one before the next group is pulled
groups = map(list, groups)

# And listify the result (to actually run out the generator and get all your
# results, assuming you need them as a list
groups = list(groups)

As a one-liner:

groups = list(map(list, map(itemgetter(1), groupby(things, key=itemgetter(0)))))

or because this many maps gets rather ugly/non-Pythonic, and list comprehensions let us do nifty stuff like unpacking to get named values, we can simplify to:

groups = [list(g) for k, g in groupby(things, key=itemgetter(0))]

Upvotes: 2

Iron Fist
Iron Fist

Reputation: 10951

Quoting from Python Doc on groupby :

itertools.groupby(iterable, key=None)
Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

>>> from itertools import groupby
>>> 
>>> things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"),
         ("vehicle", "speed boat"), ("vehicle", "school bus")]
>>> 
>>> 
>>> for _, g in groupby(things, lambda x:x[0]):
    print(list(g))

[('animal', 'bear'), ('animal', 'duck')]
[('plant', 'cactus')]
[('vehicle', 'speed boat'), ('vehicle', 'school bus')]
>>>
>>> from operator import itemgetter
>>> l = [list(g) for _, g in groupby(things, itemgetter(0))]
>>> l
[[('animal', 'bear'), ('animal', 'duck')], [('plant', 'cactus')], [('vehicle', 'speed boat'), ('vehicle', 'school bus')]]
>>> from collections import defaultdict
>>> 
>>> d = defaultdict(list)
>>>
>>> for k,v in groupby(things, itemgetter(0)):
    for sub in v:
        for item in sub:
            if item != k:
                d[k].append(item)


>>> d
defaultdict(<class 'list'>, {'animal': ['bear', 'duck'], 'plant': ['cactus'], 'vehicle': ['speed boat', 'school bus']})

Upvotes: 0

gtlambert
gtlambert

Reputation: 11961

You could use a list comprehension as follows:

from itertools import groupby

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"),
         ("vehicle", "speed boat"), ("vehicle", "school bus")]


g = groupby(things, lambda x: x[0])
answer = [list(group[1]) for group in g]
print(answer)

Output

[[('animal', 'bear'), ('animal', 'duck')],
 [('plant', 'cactus')],
 [('vehicle', 'speed boat'), ('vehicle', 'school bus')]]

Upvotes: 1

Related Questions