breakbotz
breakbotz

Reputation: 407

Python Regex: Group Backref

I'm attempting to modify strings using the .sub() function from the re module. More specifically, I'm trying to use a group backref but the function doesn't seem to register the function. For example:

> In [49]: s = ' STORE # 123  123 '
> In [50]: print re.sub('([0-9]+) +(\1)','(\1)',s)  
 STORE # 123  123

I want it to print "STORE # 123" but it seems like the first arg of .sub() isn't registering so it just spits out the initial string unmodified. I've even checked the documentation (https://docs.python.org/2/library/re.html#re.sub) and still can't figure out what I'm doing wrong. I'm running Python 2.7 by the way.

Thanks for the help!

Upvotes: 1

Views: 288

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can preserve what you want if you remove it from the match result. To do it you only need to enclose the backreference in a lookahead ((?=...) followed with ):

print re.sub(r'([0-9]+) +(?=\1)','',s)

Upvotes: 0

JuniorCompressor
JuniorCompressor

Reputation: 20025

You should use:

>>> print re.sub(r'([0-9]+) +\1', r'(\1)', ' STORE # 123  123 ')
STORE # (123) 

You use r'...' in order not to have to escape backslashes.

Upvotes: 2

Related Questions