Python regex search lines that end with a colon and all text after until next line that ends with colon

Question

I have the following text:

Test 123:

This is a blue car

Test:

This car is not blue

This car is yellow

Hello:

This is not a test

I want to put together a regex that finds all items that start with a Test or a Hello and precede a colon, and optionally a tree digit number, and return all content after that until the next line that fits that same description. So for above text, the findall regex would return an array of:

[("Test", "123", "
This is a blue car
"),
 ("Test", "", "
This car is not blue

This car is yellow
"),
 ("Hello", "", "
This is not a test")]

So far I got this:

r = re.findall(r'^(Test|Hello) *([^:]*):$', test, re.MULTILINE)

It matches each line according to the description but I'm unsure how to capture the content until the next line that ends with a colon. Any ideas?

Avinash Raj · Accepted Answer

You could use the below regex which uses DOTALL modifier,

(?:^|
)(Test|Hello) *([^:]*):
(.*?)(?=
(?:Test|Hello)|$)

DEMO

>>> import re
>>> s = """Test 123:
... 
... This is a blue car
... 
... Test:
... 
... This car is not blue
... 
... This car is yellow
... 
... Hello:
... 
... This is not a test"""
>>> re.findall(r'(?s)(?:^|
)(Test|Hello) *([^:]*):
(.*?)(?=
(?:Test|Hello)|$)', s)
[('Test', '123', '
This is a blue car
'), ('Test', '', '
This car is not blue

This car is yellow
'), ('Hello', '', '
This is not a test')]

Python regex search lines that end with a colon and all text after until next line that ends with colon

Answers (2)

Related Questions