Bahnzo
Bahnzo

Reputation: 88

Regex, how to match everything up to nth occurrence

I'm trying to get everything from a webpage up until the second occurrence of a word matchdate.

(.*?matchdate){2} is what I'm trying but that's not doing that trick. The page has 14+ matches of "matchdate" and I only want to get everything up to the second one, and then nothing else.

https://regex101.com/r/Cjyo0f/1 <--- my saved regex.

What am I missing here?

Thanks.

Upvotes: 2

Views: 4510

Answers (2)

Raymond Hettinger
Raymond Hettinger

Reputation: 226199

You almost had it! (.*?matchdate){2} was actually correct. It just needs a re.DOTALL flag so that the dot matches newlines as well as other characters.

Here is a working test:

>>> import re

>>> s = '''First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate and other
stuff you're
not interested in
like another matchdate
or a matchdate redux.
'''

>>> print(re.search('(.*?matchdate){2}', s, re.DOTALL).group())
First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate

Upvotes: 3

Neil
Neil

Reputation: 14313

There are a couple ways you can do this:

If you can, remove the g flag

Without the global flag, regex will only grab the first instance it encounters.

https://regex101.com/r/Cjyo0f/2

Add a ^ to the front of the regex

A caret will force the regex to match from the beginning of the string, ruling out all other possibilities.

https://regex101.com/r/Cjyo0f/3

If Python is available, use .split() and .join()

If regular python is available, I would recommend:

string = "I like to matchdate, I want to each matchdate for breakfest"
print "matchdate".join(string.split("matchdate")[:2])

Upvotes: 2

Related Questions