Reputation: 51
I have a sorted list that looks like:
tokens = [[46565], [44460], [73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053], [260, 259, 1151], [83, 31840, 292, 3826]]
and I want to split it into distinct lists by the length of sublist like so:
a = [[46565], [44460]]
b = [[73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053]]
c = [[260, 259, 1151]]
d = [[83, 31840, 292, 3826]]
I'm having some trouble trying to do this without just looping over the entire original list and checking the length of each sublist.
I thought perhaps I could do something with:
lengths = list(map(len,tokens))
for k, v in zip(lengths, tokens):
<SOME CODE HERE>
any ideas?
Upvotes: 1
Views: 1470
Reputation: 18625
This is about as efficient as you can get, and pretty simple:
tokens = [
[46565], [44460], [73, 2062], [1616, 338],
[9424, 24899], [1820, 11268], [43533, 5356],
[9930, 1053], [260, 259, 1151],
[83, 31840, 292, 3826]
]
groups = {}
for sublist in tokens:
groups.setdefault(len(sublist), []).append(sublist)
After this runs, groups
will be a dictionary with keys for the length of the sublist and values that are all the sublists of that length, in the order they were found in tokens
. You can then assign those entries to named variables if you want (a = groups[1]
, etc.), but for most workflows you will do better to work directly with the groups
dictionary, since that generalizes the solution (What if there's a 0-length list? What about a 15-item list?).
There is no way to do this with a one-line list comprehension because you need to cluster each input value differently. For aggregation (like this), the best solution is almost always to run a for
loop over your input data and create or update entries in an output dictionary.
The .setdefault
method of dictionaries is also very useful for this pattern, since it saves you the trouble of checking whether the entry exists before you update it. Alternatively, you could use groups = collections.defaultdict(list)
, then just update it via groups[len(sublist)].append(sublist)
.
Upvotes: 2
Reputation: 88266
One way is sorted
with itertools.groupby
:
[list(v) for _ ,v in groupby(sorted(tokens, key=len), key=len)]
[[[46565], [44460]],
[[73, 2062],
[1616, 338],
[9424, 24899],
[1820, 11268],
[43533, 5356],
[9930, 1053]],
[[260, 259, 1151]],
[[83, 31840, 292, 3826]]]
Upvotes: 3
Reputation: 989
Not the most optimal way to do it, but follows most of it.
import string
alphabets = string.ascii_lowercase
tokens = [[46565], [44460], [73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053], [260, 259, 1151], [83, 31840, 292, 3826]]
numbering = {(ord(k)-96):k for k in alphabets}
output = {k:[] for k in alphabets}
lengths = list(map(len,tokens))
for k, v in zip(lengths, tokens):
output[numbering[k]].append(v)
print(output)
This is the output:
{'a': [[46565], [44460]], 'b': [[73, 2062], [1616, 338], [9424, 24899], [1820, 11268], [43533, 5356], [9930, 1053]], 'c': [[260, 259, 1151]], 'd': [[83, 31840, 292, 3826]], 'e': [], 'f': [], 'g': [], 'h': [], 'i': [], 'j': [], 'k': [], 'l': [], 'm': [], 'n': [], 'o': [], 'p': [], 'q': [], 'r': [], 's': [], 't': [], 'u': [], 'v': [], 'w': [], 'x': [], 'y': [], 'z': []}
Upvotes: 0