halbrd
halbrd

Reputation: 135

Python regex - Replace bracketed text with contents of brackets

I'm trying to write a Python function that replaces instances of text surrounded with curly braces with the contents of the braces, while leaving empty brace-pairs alone. For example:

foo {} bar {baz} would become foo {} bar baz.

The pattern that I've created to match this is {[^{}]+}, i.e. some text that doesn't contain curly braces (to prevent overlapping matches) surrounded by a set of curly braces.

The obvious solution is to use re.sub with my pattern, and I've found that I can reference the matched text with \g<0>:

>>> re.sub("{[^{}]+}", "A \g<0> B", "foo {} bar {baz}")
'foo {} bar A {baz} B'

So that's no problem. However, I'm stuck on how to trim the brackets from the referenced text. If I try applying a range to the replacement string:

>>> re.sub("{[^{}]+}", "\g<0>"[1:-1], "foo{}bar{baz}")
'foo{}barg<0'

The range is applied before the \g<0> is resolved to the matched text, and it trims the leading \ and trailing >, leaving just g<0, which has no special meaning.

I also tried defining a function to perform the trimming:

def trimBraces(string):
    return string[1:-1]

But, unsurprisingly, that didn't change anything.

>>> re.sub("{[^{}]+}", trimBraces("\g<0>"), "foo{}bar{baz}")
'foo{}barg<0'

What am I missing here? Many thanks in advance.

Upvotes: 4

Views: 2808

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

When you use "\g<0>"[1:-1] as a replacement pattern, you only slice the "\g<0>" string, not the actual value this backreference refers to.

If you need to use your "trimming" approach, you need to pass the match data object to the re.sub:

re.sub("{[^{}]+}", lambda m: m.group()[1:-1], "foo{}bar{baz}")
# => foo{}barbaz

See this Python demo. Note that m.group() stands for the \g<0> in your pattern, i.e. the whole match value.

However, using capturing groups is a more "organic" solution, see alexce's solution.

Upvotes: 2

alecxe
alecxe

Reputation: 473853

You can use a capturing group to replace a part of the match:

>>> re.sub(r"{([^{}]+)}", r"\1", "foo{}bar{baz}")
'foo{}barbaz'
>>> re.sub(r"{([^{}]+)}", r"\1", "foo {} bar {baz}")
'foo {} bar baz'

Upvotes: 4

Related Questions