Reputation: 2253

Python: find a string between 2 strings in text

I have a text like this

s = """
...

(1) Literature

1. a.
2. b.
3. c.

...
"""

I want to cut Literature section but I have some problem with detection.

I use here

re.search("(1) Literature\n\n(.*).\n\n", s).group(1)

but search return None.

Desire output is

(1) Literature

1. a.
2. b.
3. c.

What did I do wrong?

Upvotes: 0

Answers (4)

anubhava

Reputation: 784958

You may use this regex with a capture group:

r'\(1\)\s+Literature\s+((?:.+\n)+)'

RegEx Demo

Explanation:

$1$: Match (1) text
\s+: Match 1+ whitespaces
Literature:
\s+:
(: Start capture group #1
- (?:.+\n)+: Match a line with 1+ character followed by newline. Repeat this 1 or more times to allow it to match multiple such lines
): End capture group #1

Upvotes: 1

zr0gravity7

Reputation: 3194

Regex for capturing the generic question with that structure:

$\d+$\s+(\w+)\s+((?:\d+\.\s.+\n)+)

It will capture the title "Literature", then the choices in another group (for a total of 2 groups).

It is not possible to capture repeating groups, so in order to get each of your "1. a." in a separate group you would have to match the second group from above again, with this pattern:

((\d+\.\s+.+)\n)+) then globally match to get all groups.

Upvotes: 0

The fourth bird

Reputation: 163207

You could match (1) Literature and 2 newlines, and then capture all lines that start with digits followed by a dot.

\(1\) Literature\n\n((?:\d+\..*(?:\n|$))+)

The pattern matches:

$1$ Literature\n\n Match (1) Literature and 2 newlines
( Capture group 1
- (?: Non capture group
  - \d+\..*(?:\n|$) Match 1+ digits and a dot followed by either a newline or end of string
- )+ Close non capture group and repeat it 1 or more times to match all the lines
) Close group 1

Regex demo

Another option is to capture all following lines that do not start with ( digits ) using a negative lookahead, and then trim the leading and trailing whitespaces.