Mattia Verga
Mattia Verga

Reputation: 77

String replacement under regular expression not working as expected

I'm trying to search and replace part of strings using re.sub and format capabilities of Python. I want all text like 'ESO \d+-\d+" to be replaced in the format 'ESO \d{3}-\d{3}' using leading zeroes.

I thought that this would work:

re.sub(r"ESO (\d+)-(\d+)" ,"ESO {:0>3}-{:0>3}".format(r"\1",r"\2"), line)

But I get strange results:

'ESO 409-22' becomes 'ESO 0409-022'

'ESO 539-4' becomes 'ESO 0539-04'

I can't see the error, in fact if I use two operations I get the correct result:

>>> ricerca = re.search(r"ESO (\d+)-(\d+)","ESO 409-22")
>>> print("ESO {:0>3}-{:0>3}".format(ricerca.group(1),ricerca.group(2)))
ESO 409-022

Upvotes: 0

Views: 109

Answers (1)

Alex Hall
Alex Hall

Reputation: 36023

"ESO {:0>3}-{:0>3}".format(r"\1",r"\2")

evaluates to the same as:

r"ESO 0\1-0\2"

and then the group substitution proceeds normally, so it just puts a 0 in front of the numbers.

Your last code sample is a very sensible way to solve this problem, stick to it. If you really need to use re.sub, pass a function as the replacement:

>>> import re
>>> line = 'ESO 409-22'
>>> re.sub(r"ESO (\d+)-(\d+)", lambda match: "ESO {:0>3}-{:0>3}".format(*match.groups()), line)
'ESO 409-022'
>>> help(re.sub)
Help on function sub in module re:

sub(pattern, repl, string, count=0, flags=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the match object and must return
    a replacement string to be used.

Upvotes: 1

Related Questions