Reputation: 4921
I have a list made by strings, correctly cleaned (split(',')
can be safely used), and correctly sorted depending on numbers. As a small example:
l = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
What I'm trying to achieve is to create as many sublists that start and end with single strings, that is:
[
['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]
I thought to add some logic like the following code, but I'm not sure if I'm on the correct way:
tl = []
for i in l:
# just get the variable
val = i
tl.append(val)
# split by ,
val_split = len(i.split(','))
# check if the value is the first element of the list (C1)
if val == l[0]:
print(1, val)
# check if the split of the character is longer than 2 (C1,C2)
elif val_split > 1:
print(2, val)
# check is the split of the character siis equalt to 1 (C4)
elif val_split == 1:
# here the code should compare if the character is equal to the last value of the nested list. If yes go with teh next value (C5)
if val != tl[-1]:
print(3, val)
else:
print(4, val)
Upvotes: 2
Views: 128
Reputation: 41905
Alternatively, we can throw groupby
from itertools at this problem:
from itertools import groupby
lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
groups = []
for key, group in groupby(lst, lambda x: ',' in x):
if key:
groups[-1].extend(group)
else:
a, *b = group
if b:
groups[-1].append(a)
groups.append(b)
else:
if groups:
groups[-1].append(a)
else:
groups.append([a])
print(groups)
Assumes input is in the proper order, just needs to be reformatted.
OUTPUT
% python3 test.py
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
%
Upvotes: 0
Reputation: 107015
You can use a generator to produce items after the first item of each sublist until an item with no comma is found:
def until_no_comma(seq):
for i in seq:
yield i
if ',' not in i:
return
seq = iter(l)
print([[i, *until_no_comma(seq)] for i in seq])
This outputs:
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
Demo: https://ideone.com/VJ4fnW
Upvotes: 1
Reputation: 4560
There are Three(3) options:
Option 1: Using list comprehensions.
code:
def split_into_chunks(lst, chunk_size):
return [lst[i:i + chunk_size] for i in range(0, len(lst), chunk_size)]
split_chunks = split_into_chunks(l, 5) #use index 5 to splits
print(split_chunks)
Option 2: Using Itertools
to import the islice
function module.
Code:
from itertools import islice
def split_chunks(iterable, size):
iterator = iter(iterable)
for first in iterator:
yield [first] + list(islice(iterator, size - 1))
chunked_list = list(split_chunks(l, 5)) #use index 5 to splits
print(chunked_list)
Option 3: Easier way to Use list to split one-liners without using append
.
code:
def split_into_chunks(lst, chunk_size):
return [lst[i:i + chunk_size]for i in range(0, len(lst), chunk_size)]
chunks = split_into_chunks(l, 5)
print(chunks)
Upvotes: -1
Reputation: 27201
If the input list is guaranteed to start and end with a single string and if there will always be at least one adjacent pair of single strings then:
lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
result = [[]]
for e in lst:
result[-1].append(e)
if not "," in e:
if len(result[-1]) > 1:
result.append([])
result.pop()
print(result)
Output:
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
Upvotes: 1
Reputation: 10465
With split_when
from more-itertools:
from more_itertools import split_when
lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
result = list(split_when(lst, lambda s, t: ',' not in s+t))
print(result)
Or just basic:
lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
result = []
it = iter(lst)
for s in it:
sub = [s]
for t in it:
sub.append(t)
if ',' not in t:
break
result.append(sub)
print(result)
Upvotes: 1
Reputation: 102529
Given data s
like below
s = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
you can try itertools
along with numpy
import numpy as np
import itertools
grp = np.ceil(np.cumsum(np.char.count(s, ',')==0)/2)
[list(g) for k, g in itertools.groupby(s, lambda i: grp[s.index(i)])]
or without numpy
from itertools import accumulate, groupby
from math import ceil
grp = [ceil(x/2) for x in accumulate(map(lambda x: int(x.count(',')==0), s))]
[list(g) for k, g in groupby(s, lambda i: grp[s.index(i)])]
such that you will obtain
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
Upvotes: 1
Reputation: 522506
Here is my take on this, using regular expressions. We can recombine your starting list using some distinct separator, say |
, then use re.findall
to find each single C-multi C string.
import re
inp = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
x = '|'.join(inp)
parts = re.findall(r'(?<![^|])C\d+(?:\|(?:C\d+(?:,C\d+)+)+)+\|C\d+(?![^|])', x)
output = [p.split('|') for p in parts]
print(output)
This prints:
[
['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]
Upvotes: 1