ayansalt2
ayansalt2

Reputation: 175

How can I split a text by (a), (b)?

I want to split my text by subparts (a), (b), ...

import re

s = "(a) First sentence. \n(b) Second sentence. \n(c) Third sentence."

l = re.compile('\(([a-f]+)').split(s)

With my regex I get a list of 7 elements:

['', 'a', ') First sentence. \n', 'b', ') Second sentence. \n', 'c', ') Third sentence.']

but what I want is a list of 3 elements, the first item should be (a) with the first sentence, the second item (b) and the third and last item (c):

['(a) First sentence.', '(b) Second sentence.', '(c) Third sentence.']

Upvotes: 0

Views: 122

Answers (1)

Red
Red

Reputation: 27567

You can use a positive lookahead ?= to split the string at parts where right after it, the pattern (letter_from_a_to_f_appears):

import re

s = "(a) Lorem ipsum dolor sit amet, consectetur adipiscing elit. \n(b) Nullam porta aliquet ornare. Integer non ullamcorper nibh. Curabitur eu maximus odio. Mauris egestas fermentum ligula non fermentum. Sed tincidunt dolor porta egestas consequat. Nullam pharetra fermentum venenatis. Maecenas at tempor sapien, eu gravida augue. Fusce nec elit sollicitudin est euismod placerat nec ut purus. \n(c) Phasellus fermentum enim ex. Suspendisse ac augue vitae magna convallis dapibus."
l = re.compile('(?=\([a-f]\))').split(s)

print(l)

Output:

['', '(a) Lorem ipsum dolor sit amet, consectetur adipiscing elit. \n', '(b) Nullam porta aliquet ornare. Integer non ullamcorper nibh. Curabitur eu maximus odio. Mauris egestas fermentum ligula non fermentum. Sed tincidunt dolor porta egestas consequat. Nullam pharetra fermentum venenatis. Maecenas at tempor sapien, eu gravida augue. Fusce nec elit sollicitudin est euismod placerat nec ut purus. \n', '(c) Phasellus fermentum enim ex. Suspendisse ac augue vitae magna convallis dapibus.']

If you don't want the empty string(s), you can use filter:

l = list(filter(None, l))

If you don't want the trailing newlines on each string, you can use map:

l = list(map(str.strip, l))

or

l = list(map(str.rstrip, l))

Upvotes: 2

Related Questions