Reputation: 601
Is there any way to get around this limitation of re.sub? It is not fully functional for verbose mode (with back reference here) in the replace pattern; it does not eliminate whitespace or comments (yet it does interpret backreferences properly).
import remport re
ft1=r"""(?P<test>[0-9]+)"""
ft2=r"""\g<test>and then: \g<test> #this remains"""
print re.sub(ft1,ft2,"front 1234 back",flags=re.VERBOSE) #Does not work
#result: front 1234and then: 1234 #this remains back
re.VERBOSE does not apply to the replacement pattern... Is there a work-around? (Simpler than working with groups after an re.match.)
Upvotes: 1
Views: 512
Reputation: 73
You can first use re.compile to compile regular expressions.
Here, you can make use of re.VERBOSE
flag.
Later, you can pass these compiled expressions as argument to re.sub()
Upvotes: 1
Reputation: 601
Here is the only way I have found to "compile" an re replace expression for sub. There are a few extra constraints: both spaces and newlines have to be written like spaces are written for the re match expression (in square brackets: [ ] and [\n\n\n]) and the whole replace expression should have a verbose newline at the beginning.
An example: this searches a string and detects a word repeated after /ins/ and /del/, then replaces those occurrences with a single occurrence of the word in front of .
Both the match and the replace expressions are complex, which is why I want a verbose version of the replace expression.
===========================
import re
test = "<p>Le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>"
find=r"""
<ins>
(?P<front>[^<]+) #there is something added that matches
(?P<delim1>[ .!,;:]+) #get delimiter
(?P<back1>[^<]*?)
</ins>
[ ]
<del>
(?P=front)
(?P<delim2>[ .!,;:]+)
(?P<back2>[^<]*?)
</del>
"""
replace = r"""
<<<<<\g<front>>>>> #Pop out in front matching thing
<ins>
\g<delim1>
\g<back1>
</ins>
[ ]
<del>
\g<delim2> #put delimiters and backend back
\g<back2>
</del>
"""
flatReplace = r"""<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>"""
def compileRepl(inString):
outString=inString
#get space at front of line
outString=re.sub(r"\n\s+","\n",outString)
#get space at end of line
outString=re.sub(r"\s+\n","",outString)
#get rid of comments
outString=re.sub(r"\s*#[^\n]*\n","\n",outString)
#preserve space in brackets, and eliminate brackets
outString=re.sub(r"(?<!\[)\[(\s+)\](?!\[)",r"\1",outString)
# get rid of newlines not in brackets
outString=re.sub(r"(?<!\[)(\n)+(?!\])","",outString)
#get rid of brackets around newlines
outString=re.sub(r"\[((\\n)+)\]",r"\1",outString)
#trim brackets
outString=re.sub(r"\[\[(.*?)\]\]","[\\1]",outString)
return outString
assert(flatReplace == compileRepl(replace))
print test
print compileRepl(replace)
print re.sub(find,compileRepl(replace),test, flags=re.VERBOSE)
#<p>Le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>
#<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>
#<p>Le petit <<<<<homme>>>><ins> à</ins> <del> en</del> ressorts</p>
Upvotes: 0