Reputation: 3
New to Regex, please help!
Example String:
START
blahblah
blahblah blahblah
blahblahblahblah
blahblah KEYWORD blah
blahblah
blah
END
Problem: I would like to locate the entire string (between START and END) containing a certain KEYWORD.
Context: I have a large file with multiple iterations of the multi-line START*END example string and need to sort these strings based on the KEYWORD they contain. Each string contains the same START and END, but a different KEYWORD.
What I have so far:
START\s[\s\S]*?(?=END\s|\Z) returns the entire string, but is not specific to a KEYWORD
Not sure how to go about finding the entire string based on the KEYWORD.
Any help would be appreciated.
Thanks!
Upvotes: 0
Views: 239
Reputation: 198314
(?s)(?<=START)(?:(?!END).)*?(?:KEYWORD1|KEYWORD2)(?:.*?)(?=END)
(regex101) Firstly - we consider a newline as "any character". We start just after START
, and end just before END
. In between, we want as low number of any characters that don't start the string END
as possible, followed by KEYWORD1
or KEYWORD2
, followed by as low number of any characters as possible.
This is based on the assumption that you have a finite list of keywords. If, on the other hand, keywords are identified by some other means, then you should Michael Butscher's comment first.
Upvotes: 2