matteo
matteo

Reputation: 4921

Create nested lists based on split of characters

I have a list made by strings, correctly cleaned (split(',') can be safely used), and correctly sorted depending on numbers. As a small example:

l = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

What I'm trying to achieve is to create as many sublists that start and end with single strings, that is:

[
    ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
    ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
    ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]

I thought to add some logic like the following code, but I'm not sure if I'm on the correct way:

tl = []

for i in l:
    
    # just get the variable
    val = i
    
    tl.append(val)
    
    # split by ,
    val_split = len(i.split(','))  
    
    # check if the value is the first element of the list (C1)
    if val == l[0]:
        print(1, val)
    # check if the split of the character is longer than 2 (C1,C2)
    elif val_split > 1:
        print(2, val)
    # check is the split of the character siis equalt to 1 (C4)
    elif val_split == 1:
        # here the code should compare if the character is equal to the last value of the nested list. If yes go with teh next value (C5)
        if val != tl[-1]:
            print(3, val)
        else:
            print(4, val)

Upvotes: 2

Views: 128

Answers (7)

cdlane
cdlane

Reputation: 41905

Alternatively, we can throw groupby from itertools at this problem:

from itertools import groupby

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

groups = []

for key, group in groupby(lst, lambda x: ',' in x):

    if key:
        groups[-1].extend(group)
    else:
        a, *b = group

        if b:
            groups[-1].append(a)
            groups.append(b)
        else:
            if groups:
                groups[-1].append(a)
            else:
                groups.append([a])

print(groups)

Assumes input is in the proper order, just needs to be reformatted.

OUTPUT

% python3 test.py
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
% 

Upvotes: 0

blhsing
blhsing

Reputation: 107015

You can use a generator to produce items after the first item of each sublist until an item with no comma is found:

def until_no_comma(seq):
    for i in seq:
        yield i
        if ',' not in i:
            return
seq = iter(l)
print([[i, *until_no_comma(seq)] for i in seq])

This outputs:

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

Demo: https://ideone.com/VJ4fnW

Upvotes: 1

toyota Supra
toyota Supra

Reputation: 4560

There are Three(3) options:

Option 1: Using list comprehensions.

code:

def split_into_chunks(lst, chunk_size):
     return [lst[i:i + chunk_size] for i in range(0, len(lst), chunk_size)]

split_chunks = split_into_chunks(l, 5) #use index 5 to splits
print(split_chunks)

Option 2: Using Itertools to import the islice function module. Code:

from itertools import islice

def split_chunks(iterable, size):
    iterator = iter(iterable)
    
    for first in iterator:
        yield [first] + list(islice(iterator, size - 1))

chunked_list = list(split_chunks(l, 5)) #use index 5 to splits

print(chunked_list)

Option 3: Easier way to Use list to split one-liners without using append.

code:

def split_into_chunks(lst, chunk_size):
    return [lst[i:i + chunk_size]for i in range(0, len(lst), chunk_size)]
    
     

chunks = split_into_chunks(l, 5)

print(chunks)

Upvotes: -1

Adon Bilivit
Adon Bilivit

Reputation: 27201

If the input list is guaranteed to start and end with a single string and if there will always be at least one adjacent pair of single strings then:

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
result = [[]]
for e in lst:
    result[-1].append(e)
    if not "," in e:
        if len(result[-1]) > 1:
            result.append([])
result.pop()
print(result)

Output:

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

Upvotes: 1

no comment
no comment

Reputation: 10465

With split_when from more-itertools:

from more_itertools import split_when

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

result = list(split_when(lst, lambda s, t: ',' not in s+t))

print(result)

Or just basic:

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

result = []
it = iter(lst)
for s in it:
    sub = [s]
    for t in it:
        sub.append(t)
        if ',' not in t:
            break
    result.append(sub)

print(result)

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

Given data s like below

s = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

you can try itertools along with numpy

import numpy as np
import itertools
grp = np.ceil(np.cumsum(np.char.count(s, ',')==0)/2)
[list(g) for k, g in itertools.groupby(s, lambda i: grp[s.index(i)])]

or without numpy

from itertools import accumulate, groupby
from math import ceil

grp = [ceil(x/2) for x in accumulate(map(lambda x: int(x.count(',')==0), s))]
[list(g) for k, g in groupby(s, lambda i: grp[s.index(i)])]

such that you will obtain

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522506

Here is my take on this, using regular expressions. We can recombine your starting list using some distinct separator, say |, then use re.findall to find each single C-multi C string.

import re

inp = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
x = '|'.join(inp)
parts = re.findall(r'(?<![^|])C\d+(?:\|(?:C\d+(?:,C\d+)+)+)+\|C\d+(?![^|])', x)
output = [p.split('|') for p in parts] 
print(output)

This prints:

[
    ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
    ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
    ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]

Upvotes: 1

Related Questions