Reputation: 6748
I have the following list:
['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
I want to split this list into multiple lists so that each sublist will have the substring "(Reg)" appear once:
[['1(Reg)', '100', '103', '102', '100'],
['2(Reg)', '98', '101', '100'],
['3(Reg)', '96', '99', '98'],
['4(Reg)', '100', '100', '100', '100'],
['5(Reg)', '98', '99', '99', '100'],
['6(Reg)', '99.47', '99.86', '99.67', '100']]
I've tried joining the list with a delimiter and splitting it by (Reg), but that didn't work. How can I split the list into a nested list like above?
Upvotes: 4
Views: 2254
Reputation: 6748
Here's another way with no libraries. It is a list comprehension built off of DYZ's answer:
w = []
[w.append([e]) if '(Reg)' in e else w[-1].append(e) for e in data]
Upvotes: 1
Reputation: 26315
You can also try this:
from itertools import groupby
lst = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100',
'3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100',
'5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
grouped = [list(g) for k, g in groupby(lst, key = lambda x: x.endswith('(Reg)'))]
result = [x + y for x, y in zip(grouped[0::2], grouped[1::2])]
print(result)
Which Outputs:
[['1(Reg)', '100', '103', '102', '100'], ['2(Reg)', '98', '101', '100'], ['3(Reg)', '96', '99', '98'], ['4(Reg)', '100', '100', '100', '100'], ['5(Reg)', '98', '99', '99', '100'], ['6(Reg)', '99.47', '99.86', '99.67', '100']]
Upvotes: 1
Reputation: 3547
Using itertools.groupby
lst = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
from itertools import groupby
[a+b for a,b in zip(*([iter(list(g) for k, g in groupby(lst, lambda x:'Reg' in x))]*2))]
Output:
[['1(Reg)', '100', '103', '102', '100'],
['2(Reg)', '98', '101', '100'],
['3(Reg)', '96', '99', '98'],
['4(Reg)', '100', '100', '100', '100'],
['5(Reg)', '98', '99', '99', '100'],
['6(Reg)', '99.47', '99.86', '99.67', '100']]
Upvotes: 2
Reputation: 7552
Ok, here's my take with super-simple standard list comprehensions (very similar to @jp_data_analysis's answer):
>>> from pprint import pprint
>>> d = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
>>> idx = filter(lambda i: d[i].endswith("(Reg)"), range(len(d))) + [len(d)]
>>> idx
[0, 5, 9, 13, 18, 23, 28]
>>> res = [d[idx[i-1]:idx[i]] for i in range(1,len(idx))]
>>> pprint(res)
[['1(Reg)', '100', '103', '102', '100'],
['2(Reg)', '98', '101', '100'],
['3(Reg)', '96', '99', '98'],
['4(Reg)', '100', '100', '100', '100'],
['5(Reg)', '98', '99', '99', '100'],
['6(Reg)', '99.47', '99.86', '99.67', '100']]
Explanation: idx
holds the indices of every element ending in (Reg)
(including the list length as the final element). Then the list res
is defined via intervals between those elements.
On a philosophical note: every time you face a problem like this, ask yourself: how did I get here? Why do I need to deal with some super-fragile implicit-string-format-rules instead of a real data structure? One that takes intervals and data hierarchy into account? One that enforces limitations by design and allows for simple querying? Find someone to blame and rant about them on Twitter :)
Upvotes: 4
Reputation: 164673
Here is one way, though not necessarily optimal:
from itertools import zip_longest
lst = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100',
'3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100',
'5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
indices = [i for i, j in enumerate(lst) if '(Reg)' in j]
lst_new = [lst[i:j] for i, j in zip_longest(indices, indices[1:])]
# [['1(Reg)', '100', '103', '102', '100'],
# ['2(Reg)', '98', '101', '100'],
# ['3(Reg)', '96', '99', '98'],
# ['4(Reg)', '100', '100', '100', '100'],
# ['5(Reg)', '98', '99', '99', '100'],
# ['6(Reg)', '99.47', '99.86', '99.67', '100']]
Upvotes: 5
Reputation: 71451
You can use itertools.groupby
with regular expressions:
import itertools
import re
s = ['1(Reg)', '100', '103', '102', '100', '2(Reg)', '98', '101', '100', '3(Reg)', '96', '99', '98', '4(Reg)', '100', '100', '100', '100', '5(Reg)', '98', '99', '99', '100', '6(Reg)', '99.47', '99.86', '99.67', '100']
new_data = [list(b) for _, b in itertools.groupby(s, key=lambda x:bool(re.findall('\d+\(', x)))]
final_data = [new_data[i]+new_data[i+1] for i in range(0, len(new_data), 2)]
Output:
[['1(Reg)', '100', '103', '102', '100'],
['2(Reg)', '98', '101', '100'],
['3(Reg)', '96', '99', '98'],
['4(Reg)', '100', '100', '100', '100'],
['5(Reg)', '98', '99', '99', '100'],
['6(Reg)', '99.47', '99.86', '99.67', '100']]
Upvotes: 5
Reputation: 57033
A slightly different (optimized) version of WVO's answer:
splitted = []
for item in l:
if '(Reg)' in item:
splitted.append([])
splitted[-1].append(item)
#[['1(Reg)', '100', '103', '102', '100'], ['2(Reg)', '98', '101', '100'],
# ['3(Reg)', '96', '99', '98'], ['4(Reg)', '100', '100', '100', '100'],
# ['5(Reg)', '98', '99', '99', '100'],
# ['6(Reg)', '99.47', '99.86', '99.67', '100']]
Upvotes: 6
Reputation: 476614
We can use a for
loop for this and use two lists: one of the lists we use to build the current row, and the other lists stores all rows we currently have. Like:
rows = []
row = []
for word in data:
if '(Reg)' in word:
rows.append(row)
row = []
row.append(word)
rows.append(row)
with data
the initial list of strings.
There is a problem with this however: it will first add an empty row (given the first element has (Reg)
in it. We can prevent this by only adding non-empty row
s, like:
rows = []
row = []
for word in data:
if '(Reg)' in word:
if row:
rows.append(row)
row = []
row.append(word)
if row:
rows.append(row)
We can generalize the above into a dedicated function:
split_at(data, predicate, with_empty=False):
rows = []
row = []
for word in data:
if predicate(word):
if with_empty or row:
rows.append(row)
row = []
row.append(word)
if with_empty or row:
rows.append(row)
return rows
We can then call it like:
split_at(our_list, lambda x: '(Reg)' in x)
Upvotes: 2