Reputation: 88
I'm trying to get everything from a webpage up until the second occurrence of a word matchdate
.
(.*?matchdate){2}
is what I'm trying but that's not doing that trick. The page has 14+ matches of "matchdate" and I only want to get everything up to the second one, and then nothing else.
https://regex101.com/r/Cjyo0f/1 <--- my saved regex.
What am I missing here?
Thanks.
Upvotes: 2
Views: 4510
Reputation: 226199
You almost had it! (.*?matchdate){2}
was actually correct. It just needs a re.DOTALL
flag so that the dot matches newlines as well as other characters.
Here is a working test:
>>> import re
>>> s = '''First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate and other
stuff you're
not interested in
like another matchdate
or a matchdate redux.
'''
>>> print(re.search('(.*?matchdate){2}', s, re.DOTALL).group())
First line
Second line
Third with matchdate and more
Fourth line
Fifth with matchdate
Upvotes: 3
Reputation: 14313
There are a couple ways you can do this:
g
flagWithout the global flag, regex will only grab the first instance it encounters.
https://regex101.com/r/Cjyo0f/2
^
to the front of the regexA caret will force the regex to match from the beginning of the string, ruling out all other possibilities.
https://regex101.com/r/Cjyo0f/3
.split()
and .join()
If regular python is available, I would recommend:
string = "I like to matchdate, I want to each matchdate for breakfest"
print "matchdate".join(string.split("matchdate")[:2])
Upvotes: 2