macklin
macklin

Reputation: 375

Python reference to regex in parentheses

I have a text file that needs to have the letter 't' removed if it is not immediately preceded by a number.

I am trying to do this using re.sub and I have this:

f=open('File.txt').read()
g=f
g=re.sub('([^0-9])t','',g)

This identifies the letters to be removed correctly but also removes the preceding character. How can I refer to the parenthesized regex in the replacement String? Thanks!

Upvotes: 0

Views: 115

Answers (2)

Jerry
Jerry

Reputation: 71578

Three options:

g=re.sub('([^0-9])t','\\1',g)

or

g=re.sub('(?<=[^0-9])t','',g)

or

g=re.sub('(?<![0-9])t','',g)

The first option is what you are looking for, a backreference to the captured string. \\1 will refer to the first captured group.

Lookarounds don't consume characters, so you don't need to replace them back. Here, I have used a positive lookbehind for the first one and a negative lookbehind for the second one. Those don't consume the characters within their brackets, so you are not taking the [^0-9] or [0-9] in the replacement. It might be better to use those since it prevents overlapping matches.

The positive lookbehind makes sure that t has a non-digit character before it. The negative lookbehind makes sure that t does not have a digit character before it.

Upvotes: 3

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799230

Use a lookbehind (or negative lookbehind) instead.

g=re.sub('(?<=[^0-9])t','',g)

or

g=re.sub('(?<![0-9])t','',g)

Upvotes: 4

Related Questions