Reputation: 15
I have the following representative data:
Lots of text, lots of text
PATTERN2
PATTERN2
text PATTERN1 text
text
text
..
..
text
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
..
..
PATTERN2
Basically I want to capture all of the instances of PATTERN2 but only after PATTERN1 shows up in the file.
PATTERN1 is a few characters, and PATTERN2 starts with a Timestamp (HH:MM:SS.sss) and I need to capture the entire line when PATTERN2 is found. Also worth noting that PATTERN2 shows up all over the txt file, but I only want to match PATTERN2 after PATTERN1 has been found.
I've tried various regex expressions (I'm a newb and am fumbling) and to no avail, and I'm testing using https://regexr.com/ and https://regex101.com to test, but ultimately its going to be used in a Python script.
Any help would be greatly appreciated!
Upvotes: 0
Views: 289
Reputation: 520958
One approach makes judicious use of the base string functions:
inp = """Lots of text, lots of text
PATTERN2
PATTERN2
text PATTERN1 text
text
text
..
..
text
PATTERN2
PATTERN2
PATTERN2
PATTERN2
PATTERN2
..
..
PATTERN2"""
matches = []
if re.search(r'\bPATTERN1\b', inp):
text = re.split(r'\bPATTERN1\b', inp, 1)[1]
matches = re.findall(r'\bPATTERN2\b', text)
print(matches)
# ['PATTERN2', 'PATTERN2', 'PATTERN2', 'PATTERN2', 'PATTERN2', 'PATTERN2']
Here we first check that the input text contain the PATTERN1
marker. If not, then there are no matches, otherwise, we do a regex split to find the text occurring after the first PATTERN1
occurrence. Finally, re.findall
finds all the PATTERN1
occurrences in this target text.
Upvotes: 1