allstar
allstar

Reputation: 1205

regex up to a list of strings (without capturing that last string)

I am trying to form a regular expression to match text between a start-word and the first of a list of stop-words. However, I do not want to include the stop-word in my match.

(The use case is replacing a section of a document, stopping before the keyword signifying the next section)

My regular expression is:

(StartWord)[\s\S]*?(StopWord1|StopWord2|$)


However, this match includes the stop-word. See the example here: http://regexr.com/38pb9

Any thoughts? Thank you!

Upvotes: 0

Views: 32

Answers (1)

p.s.w.g
p.s.w.g

Reputation: 149108

If your regex engine supports look aheads, you could just use this:

((StartWord)[\s\S]*?(?=StopWord1|StopWord2|$))

The look ahead makes that the match stops when the stop word or the end of the string is encountered, but it is not actually captured as part of the match.

If you also need to exclude the start word, you can use a look behind (again, assuming your regex engine supports it):

((?<=StartWord)[\s\S]*?(?=StopWord1|StopWord2|$))

But of course the simplest method may just be to use your existing pattern but use a group to extract only the parts that you need:

(StartWord)([\s\S]*?)(StopWord1|StopWord2|$)

Here, group 1 will contain the start word, group 2 will contain the body of the match, and group 3 will contain the stop word. In whatever language you're using, you can extract group 2 to get just the body.

Upvotes: 2

Related Questions