Match pattern on new lines after first pattern is found?

Question

I have the following representative data:

Lots of text, lots of text
PATTERN2
PATTERN2
text PATTERN1 text
text
text
..
..
text
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
..
..
PATTERN2

Basically I want to capture all of the instances of PATTERN2 but only after PATTERN1 shows up in the file.

PATTERN1 is a few characters, and PATTERN2 starts with a Timestamp (HH:MM:SS.sss) and I need to capture the entire line when PATTERN2 is found. Also worth noting that PATTERN2 shows up all over the txt file, but I only want to match PATTERN2 after PATTERN1 has been found.

I've tried various regex expressions (I'm a newb and am fumbling) and to no avail, and I'm testing using https://regexr.com/ and https://regex101.com to test, but ultimately its going to be used in a Python script.

Any help would be greatly appreciated!

Tim Biegeleisen · Accepted Answer

One approach makes judicious use of the base string functions:

inp = """Lots of text, lots of text
PATTERN2
PATTERN2
text PATTERN1 text
text
text
..
..
text
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
..
..
PATTERN2"""

matches = []
if re.search(r'\bPATTERN1\b', inp):
    text = re.split(r'\bPATTERN1\b', inp, 1)[1]
    matches = re.findall(r'\bPATTERN2\b', text)

print(matches)
# ['PATTERN2', 'PATTERN2', 'PATTERN2', 'PATTERN2', 'PATTERN2', 'PATTERN2']

Here we first check that the input text contain the PATTERN1 marker. If not, then there are no matches, otherwise, we do a regex split to find the text occurring after the first PATTERN1 occurrence. Finally, re.findall finds all the PATTERN1 occurrences in this target text.

Match pattern on new lines after first pattern is found?

Answers (1)

Related Questions