Capturing text inside keywords using regular expression

Question

I'm tying to capture multiple lines that are between a special keyword and separated by newlines.

text = """
KeyWord some text
Data: 012
***coconut***
list[123]
par(098)
Finish me


KeyWord random random text
Data: 1257
Cowboy
***mango***
list[121343]
par(afsd)
Catwoman
Tamarindo
Gotic
Gotham




KeyWord another text
Data: 532
***banana***
It can have more lines
And more
And more
list[dhf]
par(345)


"""

As you can see every 'paragraph' starts with KeyWord and it has a different number of lines. I want to grab each paragraph that is separated by n blank lines, and put them into a list, so I can later iterate over the list that should only contains lines with text (the paragraphs). The length of the final list should be 3. And should not contain blank lines, only lines with characters.

I tried the following with no success:

pattern = re.compile(r'KeyWord .+KeyWord',re.DOTALL)

The fourth bird · Accepted Answer

You could get the matches without using re.DOTALL to prevent unnecessary backtracking.

If the KeyWord is always at the start of the line, you could use an anchor ^ and re.MULTILINE

^KeyWord\b.*(?:
?
(?!KeyWord\b).*)*

Explanation

^KeyWord\b Start of line, match KeyWord and word boundary
.* Match 0+ times any char except a newline
(?: Non capture goup
- ? Match a newline
(?!KeyWord\b).* Assert what is directly to the right is not KeyWord and match the whole line
)* Close group and repeat 0+ times

Regex demo | Python demo

Example code

result = re.findall(r"^KeyWord\b.*(?:
?
(?!KeyWord\b).*)*", text, re.MULTILINE)
print(result)
print(len(result))

Output

['KeyWord some text
Data: 012
***coconut***
list[123]
par(098)
Finish me

', 'KeyWord random random text
Data: 1257
Cowboy
***mango***
list[121343]
par(afsd)
Catwoman
Tamarindo
Gotic
Gotham



', 'KeyWord another text
Data: 532
***banana***
It can have more lines
And more
And more
list[dhf]
par(345)


']
3

Capturing text inside keywords using regular expression

Answers (2)

Related Questions