Reputation: 333
I have a string like below:
result = """The following table provides the details.
acquired, by major class:
(US$ in millions) Customer relationships 15year $265
There is another line without space here.
Another table starts here:
(USS in millions) 2018 2017
Income (loss) from continuing operations $298 $129"""
I have to take all the sentences that contain more than 3 spaces and put them in a list of lists. Below is something I have tried so far:
lines = result.splitlines()
table_list = []
for i in range(len(lines)):
if re.search(r' {3,}', lines[i]):
table_list.append(lines[i])
Resultant output of above code:
['(US$ in millions) Customer relationships 15year $265','(USS in millions) 2018 2017','Income (loss) from continuing operations $298 $129']
Expected Output:
[['(US$ in millions) Customer relationships 15year $265'],['(USS in millions) 2018 2017','Income (loss) from continuing operations $298 $129']]
Further explanation of output condition: Expected output should be a list of lists. While iterating through each line, if there are consecutive sentences that contain 3 or more spaces between 2 words, all of these lines should be part of same list within the main list. If a line does not contain 3 or more spaces between 2 words, this breaks the chain. If there is another line that contains 3 or more spaces between 2 words then this line becomes part of a new list inside the main list.
Upvotes: 0
Views: 51
Reputation: 29742
Use itertools.groupby
with re.findall
:
from itertools import groupby
def has_spaces(str_):
return bool(re.findall("\s{3,}", str_))
[list(g) for k, g in groupby(result.splitlines(), key=has_spaces) if k]
Output:
[['(US$ in millions) Customer relationships 15year $265'],
['(USS in millions) 2018 2017',
'Income (loss) from continuing operations $298 $129']]
Upvotes: 1