bbell11
bbell11

Reputation: 51

Splitting a list of lists into multiple lists by length of sub list

I have a sorted list that looks like:

tokens = [[46565], [44460], [73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053], [260, 259, 1151], [83, 31840, 292, 3826]]

and I want to split it into distinct lists by the length of sublist like so:

a = [[46565], [44460]]
b = [[73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053]]
c = [[260, 259, 1151]]
d = [[83, 31840, 292, 3826]]

I'm having some trouble trying to do this without just looping over the entire original list and checking the length of each sublist.

I thought perhaps I could do something with:

lengths = list(map(len,tokens))

for k, v in zip(lengths, tokens):
    <SOME CODE HERE>

any ideas?

Upvotes: 1

Views: 1470

Answers (3)

Matthias Fripp
Matthias Fripp

Reputation: 18625

This is about as efficient as you can get, and pretty simple:

tokens = [
    [46565], [44460], [73, 2062], [1616, 338], 
    [9424, 24899], [1820, 11268], [43533, 5356], 
    [9930, 1053], [260, 259, 1151], 
    [83, 31840, 292, 3826]
]
groups = {}
for sublist in tokens:
    groups.setdefault(len(sublist), []).append(sublist)

After this runs, groups will be a dictionary with keys for the length of the sublist and values that are all the sublists of that length, in the order they were found in tokens. You can then assign those entries to named variables if you want (a = groups[1], etc.), but for most workflows you will do better to work directly with the groups dictionary, since that generalizes the solution (What if there's a 0-length list? What about a 15-item list?).

There is no way to do this with a one-line list comprehension because you need to cluster each input value differently. For aggregation (like this), the best solution is almost always to run a for loop over your input data and create or update entries in an output dictionary.

The .setdefault method of dictionaries is also very useful for this pattern, since it saves you the trouble of checking whether the entry exists before you update it. Alternatively, you could use groups = collections.defaultdict(list), then just update it via groups[len(sublist)].append(sublist).

Upvotes: 2

yatu
yatu

Reputation: 88266

One way is sorted with itertools.groupby:

[list(v) for _ ,v in groupby(sorted(tokens, key=len), key=len)]

[[[46565], [44460]],
 [[73, 2062],
  [1616, 338],
  [9424, 24899],
  [1820, 11268],
  [43533, 5356],
  [9930, 1053]],
 [[260, 259, 1151]],
 [[83, 31840, 292, 3826]]]

Upvotes: 3

Kavin Dsouza
Kavin Dsouza

Reputation: 989

Not the most optimal way to do it, but follows most of it.

import string

alphabets = string.ascii_lowercase

tokens = [[46565], [44460], [73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053], [260, 259, 1151], [83, 31840, 292, 3826]]

numbering = {(ord(k)-96):k for k in alphabets}
output = {k:[] for k in alphabets}


lengths = list(map(len,tokens))

for k, v in zip(lengths, tokens):
    output[numbering[k]].append(v)

print(output)

This is the output:

{'a': [[46565], [44460]], 'b': [[73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053]], 'c': [[260, 259, 1151]], 'd': [[83, 31840, 292, 3826]], 'e': [], 'f': [], 'g': [], 'h': [], 'i': [], 'j': [], 'k': [], 'l': [], 'm': [], 'n': [], 'o': [], 'p': [], 'q': [], 'r': [], 's': [], 't': [], 'u': [], 'v': [], 'w': [], 'x': [], 'y': [], 'z': []}

Upvotes: 0

Related Questions