Idle_92
Idle_92

Reputation: 69

Regroup sublists by first element in python

I have a nested list that looks something like this:

first_list = [[a, 1], [b, 3], [a, 6], [a, 2], [b, 4], [b, 5], ...]

I want to group these by their first element, and create a new nested list that looks like:

new_list = [ [1, 6, 2, ...], [3, 4, 5, ...], ...]

where all of the elements that started with a go in the first sublist and so on. The number of different values a, b, etc. are not known before runtime, or I could do something like:

a_list = []
b_list = []
for tag, x in first_list:
    if tag == a:
        a_list.append(x)
    elif tag == b:
        b_list.append(x)
new_list = [a_list, b_list]

I am struggling to adapt this for an arbitrary number of tags, however.

I probably omitted an important part of the question, but I should say that I already have a list of "tags", i.e:

tags = [a, b, c, d, ...]

They are not actually characters, hence the lack of inverted commas, but they should be hashable in any case.

Upvotes: 1

Views: 1820

Answers (5)

jpp
jpp

Reputation: 164693

With Python, and programming in general, you should avoid creating a variable number of variables.

defaultdict

You can use a defaultdict of list objects. This extends naturally to an arbitrary number of groups without having to name variables explicitly.

first_list = [['a', 1], ['b', 3], ['a', 6], ['a', 2], ['b', 4], ['b', 5]]

from collections import defaultdict

dd = defaultdict(list)

for cat, num in first_list:
    dd[cat].append(num)

defaultdict(list, {'a': [1, 6, 2],
                   'b': [3, 4, 5]})

groupby

The defaultdict solution has O(n) complexity, but an aptly named itertools.groupby solution is possible which requires sorting and O(n log n) complexity:

from itertools import groupby
from operator import itemgetter

sorter = sorted(first_list, key=itemgetter(0))
grouper = groupby(sorter, key=itemgetter(0))
res = {i: list(map(itemgetter(1), j)) for i, j in grouper}

{'a': [1, 6, 2], 'b': [3, 4, 5]}

List of list output

This is as trivial as calling list on dict.values:

res_list = list(res.values())

Upvotes: 4

Yasin Yousif
Yasin Yousif

Reputation: 967

Okay, there's a built-in methods for this in python , but in abstract algorithmic way , we could say:

first_list = [["a", 1], ["b", 3], ["a", 6], ["a", 2], ["b", 4], ["b", 5],["c",5]]

indx_list = [x[0] for x in first_list]

new_list = [[] for x in range(len(first_list))]

for x in first_list:
    new_list[indx_list.index(x[0])] += [x[-1]]

print(new_list)

Upvotes: 0

Woody1193
Woody1193

Reputation: 7980

This is a great opportunity to use the itertools library and a list-comprehension:

first_list = [['a', 1], ['b', 3], ['a', 6], ['a', 2], ['b', 4], ['b', 5], ...]
keyfunc = lambda x: x[0]
new_list = [[v1[1] for v1 in v] for k, v in itertools.groupby(sorted(first_list, key = keyfunc), key = keyfunc)]

What I'm doing here is grouping the list by the first value in the sublist and pulling the second value off. Note that the list needs to be sorted beforehand so this will run in O(n log n) time.

Upvotes: 0

riddler
riddler

Reputation: 467

A reduce would work for any numbers of tags.

first_list = [['a', 1], ['b', 3], ['a', 6], ['a', 2], ['b', 4], ['b', 5]]
def lambda_group(acc, val):
    tag, x = val
    if key not in acc:
        acc[key] = []
    acc[key].append(value)
    return acc
grouped_vals = reduce(lambda_group, first_list, {})
regrouped = list(grouped_vals.values())

Produces [[1, 6, 2], [3, 4, 5]]

Upvotes: 0

Yilun Zhang
Yilun Zhang

Reputation: 9018

First of all, your a and b should probably be strings.

You can do this using list comprehensions:

first_list = [["a", 1], ["b", 3], ["a", 6], ["a", 2], ["b", 4], ["b", 5]]
a_list = [x for x in first_list if x[0] == "a"]
b_list = [x for x in first_list if x[0] == "b"]
new_list = [a_list, b_list]

Upvotes: 1

Related Questions