Reputation: 6748

Splitting a List Based on a Substring

I have the following list:

['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']

I want to split this list into multiple lists so that each sublist will have the substring "(Reg)" appear once:

[['1(Reg)', '100', '103', '102', '100'],
['2(Reg)', '98', '101', '100'],
['3(Reg)', '96', '99', '98'],
['4(Reg)', '100', '100', '100', '100'],
['5(Reg)', '98', '99', '99', '100'],
['6(Reg)', '99.47', '99.86', '99.67', '100']]

I've tried joining the list with a delimiter and splitting it by (Reg), but that didn't work. How can I split the list into a nested list like above?

Upvotes: 4

Answers (8)

whackamadoodle3000

Reputation: 6748

Here's another way with no libraries. It is a list comprehension built off of DYZ's answer:

w = []
[w.append([e]) if '(Reg)' in e else w[-1].append(e) for e in data]

Upvotes: 1

RoadRunner

Reputation: 26335

You can also try this:

from itertools import groupby

lst = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100',
       '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100',
       '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']

grouped = [list(g) for k, g in groupby(lst, key = lambda x: x.endswith('(Reg)'))]

result = [x + y for x, y in zip(grouped[0::2], grouped[1::2])]

print(result)

Which Outputs:

[['1(Reg)', '100', '103', '102', '100'], ['2(Reg)', '98', '101', '100'], ['3(Reg)', '96', '99', '98'], ['4(Reg)', '100', '100', '100', '100'], ['5(Reg)', '98', '99', '99', '100'], ['6(Reg)', '99.47', '99.86', '99.67', '100']]

Upvotes: 1

Transhuman

Reputation: 3547

Using itertools.groupby

lst = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
from itertools import groupby
[a+b for a,b in zip(*([iter(list(g) for k, g in groupby(lst, lambda x:'Reg' in x))]*2))]

Output:

[['1(Reg)', '100', '103', '102', '100'],
 ['2(Reg)', '98', '101', '100'],
 ['3(Reg)', '96', '99', '98'],
 ['4(Reg)', '100', '100', '100', '100'],
 ['5(Reg)', '98', '99', '99', '100'],
 ['6(Reg)', '99.47', '99.86', '99.67', '100']]

Upvotes: 2

Pavel

Reputation: 7562

Ok, here's my take with super-simple standard list comprehensions (very similar to @jp_data_analysis's answer):

>>> from pprint import pprint
>>> d = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
>>> idx = filter(lambda i: d[i].endswith("(Reg)"), range(len(d))) + [len(d)]
>>> idx
[0, 5, 9, 13, 18, 23, 28]
>>> res = [d[idx[i-1]:idx[i]] for i in range(1,len(idx))]
>>> pprint(res)
[['1(Reg)', '100', '103', '102', '100'],
 ['2(Reg)', '98', '101', '100'],
 ['3(Reg)', '96', '99', '98'],
 ['4(Reg)', '100', '100', '100', '100'],
 ['5(Reg)', '98', '99', '99', '100'],
 ['6(Reg)', '99.47', '99.86', '99.67', '100']]

Explanation: idx holds the indices of every element ending in (Reg) (including the list length as the final element). Then the list res is defined via intervals between those elements.

On a philosophical note: every time you face a problem like this, ask yourself: how did I get here? Why do I need to deal with some super-fragile implicit-string-format-rules instead of a real data structure? One that takes intervals and data hierarchy into account? One that enforces limitations by design and allows for simple querying? _{Find someone to blame and rant about them on Twitter :)}

Upvotes: 4

jpp

Reputation: 164843

Here is one way, though not necessarily optimal:

from itertools import zip_longest

lst = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100',
       '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100',
       '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']

indices = [i for i, j in enumerate(lst) if '(Reg)' in j]
lst_new = [lst[i:j] for i, j in zip_longest(indices, indices[1:])]

# [['1(Reg)', '100', '103', '102', '100'],
#  ['2(Reg)', '98', '101', '100'],
#  ['3(Reg)', '96', '99', '98'],
#  ['4(Reg)', '100', '100', '100', '100'],
#  ['5(Reg)', '98', '99', '99', '100'],
#  ['6(Reg)', '99.47', '99.86', '99.67', '100']]

Upvotes: 5

Ajax1234

Reputation: 71471

You can use itertools.groupby with regular expressions:

import itertools
import re
s = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
new_data = [list(b) for _, b in itertools.groupby(s, key=lambda x:bool(re.findall('\d+\(', x)))]
final_data = [new_data[i]+new_data[i+1] for i in range(0, len(new_data), 2)]

Output:

[['1(Reg)', '100', '103', '102', '100'], 
 ['2(Reg)', '98', '101', '100'], 
 ['3(Reg)', '96', '99', '98'], 
 ['4(Reg)', '100', '100', '100', '100'], 
 ['5(Reg)', '98', '99', '99', '100'], 
 ['6(Reg)', '99.47', '99.86', '99.67', '100']]

Upvotes: 5

DYZ

Reputation: 57125

A slightly different (optimized) version of WVO's answer:

splitted = []

for item in l:
    if '(Reg)' in item:
        splitted.append([])
    splitted[-1].append(item)

#[['1(Reg)', '100', '103', '102', '100'], ['2(Reg)', '98', '101', '100'], 
# ['3(Reg)', '96', '99', '98'], ['4(Reg)', '100', '100', '100', '100'], 
# ['5(Reg)', '98', '99', '99', '100'], 
# ['6(Reg)', '99.47', '99.86', '99.67', '100']]

Upvotes: 6

willeM_ Van Onsem

Reputation: 477794

We can use a for loop for this and use two lists: one of the lists we use to build the current row, and the other lists stores all rows we currently have. Like:

rows = []
row = []
for word in data:
    if '(Reg)' in word:
        rows.append(row)
        row = []
    row.append(word)
rows.append(row)

with data the initial list of strings.

There is a problem with this however: it will first add an empty row (given the first element has (Reg) in it. We can prevent this by only adding non-empty rows, like:

rows = []
row = []
for word in data:
    if '(Reg)' in word:
        if row:
            rows.append(row)
        row = []
    row.append(word)
if row:
    rows.append(row)

We can generalize the above into a dedicated function:

split_at(data, predicate, with_empty=False):
    rows = []
    row = []
    for word in data:
        if predicate(word):
            if with_empty or row:
                rows.append(row)
            row = []
        row.append(word)
    if with_empty or row:
        rows.append(row)
    return rows

We can then call it like:

split_at(our_list, lambda x: '(Reg)' in x)

Upvotes: 2

Splitting a List Based on a Substring

Answers (8)

Related Questions