Reputation: 202

Python - re.sub without replacing a part of regex

So for example I have a string "perfect bear hunts" and I want to replace the word before occurence of "bear" with word "the".

So the resulting string would be "the bear hunts"

I thought I would use

re.sub("\w+ bear","the","perfect bear hunts")

but it replaces "bear" too. How do I exclude bear from being replaced while also having it used in matching?

Upvotes: 3

Answers (4)

alexwlchan

Reputation: 6108

Like the other answers, I’d use a positive lookahead assertion.

Then to fix the issue raised by Rawing in a couple of the comments (what about words like “beard”?), I’d add (\b|$). This matches a word boundary or the end of the string, so you only match on the word bear, and nothing longer.

So you get the following:

import re

def bear_replace(string):
    return re.sub(r"\w+ (?=bear(\b|$))", "the ", string)

and test cases (using pytest):

import pytest

@pytest.mark.parametrize('string, expected', [
    ("perfect bear swims", "the bear swims"),

    # We only capture the first word before 'bear
    ("before perfect bear swims", "before the bear swims"),

    # 'beard' isn't captured
    ("a perfect beard", "a perfect beard"),

    # We handle the case where 'bear' is the end of the string
    ("perfect bear", "the bear"),

    # 'bear' is followed by a non-space punctuation character
    ("perfect bear-string", "the bear-string"),
])
def test_bear_replace(string, expected):
    assert bear_replace(string) == expected

Upvotes: 2

hspandher

Reputation: 16763

Look Behind and Look Ahead regular expressions is what you are looking for.

re.sub(".+(?=bear)", "the ", "prefect bear swims")

Upvotes: 1

Igl3

Reputation: 5108

Use a Positive Lookahead to replace everything before bear:

re.sub(".+(?=bear )","the ","perfect bear swims")

.+ will capture any character (except for line terminators).

Upvotes: 1

Felk

Reputation: 8234

An alternative to using lookaheads:

Capture the part you want to keep using a group () and reinsert it using \1 in the replacement.

re.sub("\w+ (bear)",r"the \1","perfect bear swims")

Upvotes: 1

Python - re.sub without replacing a part of regex

Answers (4)

Related Questions