Reputation: 40878
I would like to replace all occurrences of 3 or more "=" with an equal-number of "-".
def f(a, b):
'''
Example
=======
>>> from x import y
'''
return a == b
becomes
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b # don't touch
My working but hacky solution is to pass a lambda to repl
from re.sub()
that grabs the length of each match:
>>> import re
>>> s = """
... def f(a, b):
... '''
... Example
... =======
... >>> from x import y
... '''
... return a == b"""
>>> eq = r'(={3,})'
>>> print(re.sub(eq, lambda x: '-' * (x.end() - x.start()), s))
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b
Can I do this without needing to pass a function to re.sub()
?
My thinking would be that I'd need r'(=){3,}'
(a variable-length capturing group), but re.sub(r'(=){3,}', '-', s)
has a problem with greediness, I believe.
Can I modify the regex eq
above so that the lambda isn't needed?
Upvotes: 7
Views: 2136
Reputation: 2054
The question explicitly asks for a solution that doesn't use a function, but for completeness and for someone who is looking for a clearer solution (that doesn't involve lots of regex tricks), it's possible to use a function as in Replacing a RegEx with a string of characters with the same length:
re.sub('={3,}', lambda x: '-' * len(x.group()), s)
Upvotes: 2
Reputation: 89547
Using the regex module, you can write:
regex.sub(r'\G(?!\A)=|=(?===)', '-', s)
\G
is the position immediately after the last successful match or the start of the string.(?!\A)
forces the start of the string to fail.The second branch =(?===)
succeeds when a =
is followed by two other =
. Then the next matches use the first branch \G(?!\A)=
until there are no more consecutive =
.
Upvotes: 2
Reputation: 402333
Using re.sub
, this uses some deceptive lookahead trickery and works assuming your pattern-to-replace is always followed by a newline '\n'
.
print(re.sub('=(?=={2}|=?\n)', '-', s))
def f(a, b):
'''
Example
-------
>>> from x import y
'''
return a == b
Details
"Replace an equal sign if it is succeeded by two equal signs or an optional equal sign and newline."
= # equal sign if
(?=={2} # lookahead
| # regex OR
=? # optional equal sign
\n # newline
)
Upvotes: 2
Reputation: 43136
It's possible, but not advisable.
The way re.sub
works is that it finds a complete match and then it replaces it. It doesn't replace each capture group separately, so things like re.sub(r'(=){3,}', '-', s)
won't work - that'll replace the entire match with a dash, not each occurence of the =
character.
>>> re.sub(r'(=){3,}', '-', '=== ===')
'- -'
So if you want to avoid a lambda, you have to write a regex that matches individual =
characters - but only if there's at least 3 of them. This is, of course, much more difficult than simply matching 3 or more =
characters with the simple pattern ={3,}
. It requires some use of lookarounds and looks like this:
(?<===)=|(?<==)=(?==)|=(?===)
This does what you want:
>>> re.sub(r'(?<===)=|(?<==)=(?==)|=(?===)', '-', '= == === ======')
'= == --- ------'
But it's clearly much less readable than the original lambda
solution.
Upvotes: 2
Reputation: 15738
With some help from lookahead/lookbehind it is possible to replace by char:
>>> re.sub("(=(?===)|(?<===)=|(?<==)=(?==))", "-", "=== == ======= asdlkfj")
... '--- == ------- asdlkfj'
Upvotes: 3