Reputation: 473
I am trying to split a text where it is between \n\n and \n, in that order. Take this string for example:
\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good.\n\nPears are good as well. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it.
My desired output is:
[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]
I want to parse like this because anything between \n\n and \n is the title and the rest is text under the title (So "Healthy Fruits" and "Sour Fruits" . Not sure if this is the best way to grab the titles and its text.
Upvotes: 1
Views: 63
Reputation: 103754
Given:
txt='''\
\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good.\n\nPears are good as well. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it.'''
desired=[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]
You can use the regex:
r'\n\n([\s\S]*?)(?=(?:\n\n.*\n[^\n])|\Z)'
Python demo:
>>> sp=[tuple(re.split('\n+',l)) for l in re.findall(r'\n\n([\s\S]*?)(?=(?:\n\n.*\n[^\n])|\Z)',txt) if '\n' in l]
>>> sp
[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]
>>> sp==desired
True
Upvotes: 1
Reputation: 616
This not regex but it works:
text="\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it."
NewList=[]
Newtext=text.split("\n\n")
for line in Newtext:
if line.find("\n")>=0:
NewList.extend(line.split('\n'))
NewList[len(NewList)-1]=str(NewList[len(NewList)-1])+str(Newtext[len(Newtext)-1])
Upvotes: 1