Reputation: 202
So for example I have a string "perfect bear hunts" and I want to replace the word before occurence of "bear" with word "the".
So the resulting string would be "the bear hunts"
I thought I would use
re.sub("\w+ bear","the","perfect bear hunts")
but it replaces "bear" too. How do I exclude bear from being replaced while also having it used in matching?
Upvotes: 3
Views: 3307
Reputation: 6108
Like the other answers, I’d use a positive lookahead assertion.
Then to fix the issue raised by Rawing in a couple of the comments (what about words like “beard”?), I’d add (\b|$)
. This matches a word boundary or the end of the string, so you only match on the word bear
, and nothing longer.
So you get the following:
import re
def bear_replace(string):
return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)
and test cases (using pytest):
import pytest
@pytest.mark.parametrize('string, expected', [
("perfect bear swims", "the bear swims"),
# We only capture the first word before 'bear
("before perfect bear swims", "before the bear swims"),
# 'beard' isn't captured
("a perfect beard", "a perfect beard"),
# We handle the case where 'bear' is the end of the string
("perfect bear", "the bear"),
# 'bear' is followed by a non-space punctuation character
("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
assert bear_replace(string) == expected
Upvotes: 2
Reputation: 16763
Look Behind
and Look Ahead
regular expressions is what you are looking for.
re.sub(".+(?=bear)", "the ", "prefect bear swims")
Upvotes: 1
Reputation: 5108
Use a Positive Lookahead to replace everything before bear:
re.sub(".+(?=bear )","the ","perfect bear swims")
.+
will capture any character (except for line terminators).
Upvotes: 1
Reputation: 8234
An alternative to using lookaheads:
Capture the part you want to keep using a group ()
and reinsert it using \1
in the replacement.
re.sub("\w+ (bear)",r"the \1","perfect bear swims")
Upvotes: 1