Regex to get text between multiple newlines in Python

Question

I am trying to split a text where it is between and , in that order. Take this string for example:

My take on fruits. Healthy Fruits An apple is a fruit and it\'s very good. Pears are good as well. Bananas are very good too and healthy. Sour Fruits Oranges are on the sour side and contains a lot of vitamin C. Grapefruits are even more sour, if you can believe it.

My desired output is:

[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

I want to parse like this because anything between and is the title and the rest is text under the title (So "Healthy Fruits" and "Sour Fruits" . Not sure if this is the best way to grab the titles and its text.

dawg · Accepted Answer

Given:

txt='''\


My take on fruits.

Healthy Fruits
An apple is a fruit and it\'s very good.

Pears are good as well. Bananas are very good too and healthy.

Sour Fruits
Oranges are on the sour side and contains a lot of vitamin C.

Grapefruits are even more sour, if you can believe it.'''

desired=[('Healthy Fruits',   "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'),  ('Sour Fruits',   'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

You can use the regex:

r'

([\s\S]*?)(?=(?:

.*
[^
])|\Z)'

Demo

Python demo:

>>> sp=[tuple(re.split('
+',l)) for l in re.findall(r'

([\s\S]*?)(?=(?:

.*
[^
])|\Z)',txt) if '
' in l]

>>> sp
[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

>>> sp==desired
True

Regex to get text between multiple newlines in Python

Answers (2)

Related Questions