Reputation: 1052
I am trying to replace a selected text with a single word from that selected text using regex. I tried re.sub() but it seems that it takes the second argument "The word that I want to replace it with the text" as a string, not as regex.
Here is my string:
I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> .
And here is my code:
# The regex of the form <ERR targ=...> .. </ERR>
select_text_regex = r"<ERR[^<]+<\/ERR>"
# The regex of the correct word that will replace the selected text of teh form <ERR targ=...> .. </ERR>
correct_word_regex = r"targ=([^>]+)>"
line = re.sub(select_text_regex, correct_word_regex, line.rstrip())
I get:
I go to Bridgebrook i go out targ=([^>]+)> on Tuesday night i go to
Youth targ=([^>]+)> .
My goal is:
I go to Bridgebrook i go out sometimes on Tuesday night i go to
Youth club .
Does Python support replacing two strings using Regex?
Upvotes: 0
Views: 274
Reputation: 1514
Here's another solution (I also rewrote the regex using "non-greedy" modifiers by putting ?
after *
because I find it more readable).
The group referenced by r"\1"
is done with parenthises as an unnamed group. Also used re.compile
as a style preference to reduce the number of args:
line = "I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> ."
select_text_regex = re.compile(r"<ERR targ=(.*?)>.*?<\/ERR>")
select_text_regex.sub(r"\1", line)
Named group alternative:
line = "I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> ."
select_text_regex = re.compile(r"<ERR targ=(?P<to_replace>.*?)>.*?<\/ERR>")
select_text_regex.sub(r"\g<to_replace>", line)
You can find some docs on group referencing here:
https://docs.python.org/3/library/re.html#regular-expression-syntax
Upvotes: 1
Reputation: 46
What you're looking for is regex capture groups. Instead of selecting the regex and then trying to replace it with another regex, put the part of your regex you want to match inside parenthesis in your select statement, then get it back in the replacement with \1. (the number being the group you included)
line = "I go to Bridgebrook i go out <ERR targ=sometimes> some times </ERR> on Tuesday night i go to Youth <ERR targ=club> clob </ERR> ."
select_text_regex = r"<ERR targ=([^<]+)>[^<]+<\/ERR>" #Correct Here.
correct_word_regex = r"\1" #And here.
line = re.sub(select_text_regex, correct_word_regex, line.rstrip())
print(line)
Upvotes: 0
Reputation: 9597
You would need to match the target word in the pattern, as a capturing group - you can't start an entirely new search in the replacement string!
Not tested, but this should do the job:
Replace r"<ERR targ=(.*?)>.*?</ERR>"
With r"\1"
Upvotes: 0