Reputation: 143
The string
txt = "this is a red house"
has come into existence. And then there is a list
patterns = ["thi", "a r", "use"]
with some matches for it.
The plan was to use a = re.sub("".join(patterns), "".join(patterns) + "^", txt)
, which i hoped would return as thi^s is a r^ed house^. Not so much – it just prints the string again. If re.search
is used instead, it actually returns None
, so the reason is that re.sub
doesn’t find anything, simply returning the string.
I was ready to pull the plug on this, thinking that re.sub
just can’t be used the way i thought it could, and then i accidentally tried it inside a simple loop:
for i in patterns:
a = re.sub(i, i + "^", txt)
print(a)
And suddenly it (almost) worked: thi^s is a red house [\n] this is a r^ed house [\n] this is a red house^. Now i can’t just let it go. What is going on?
Upvotes: 0
Views: 3173
Reputation: 31
This gives you the result you're looking for:
txt = "this is a red house"
patterns = ["thi", "a r", "use"]
for s in patterns:
txt = re.sub(s,s+'^',txt)
print(txt)
First, your print statement is inside the loop, hence the duplicate strings.
Second, your re.sub(...) is returning the changes to 'txt' for each pass through the loop. If you wish to accumulate the changes you need to assign the results back to 'txt'. Otherwise you will only see the latest substitution assigned to 'a'.
Third, "".join(patterns) results in a string "thia ruse" which will not match any part of 'txt'.
I hope this helps. Regular expressions are discipline by themselves. I've been using them since the 80's and still need to check the Docs. Keep going!
Upvotes: 2
Reputation: 298374
Your loop can be fixed completely if you replace a
with txt
:
for i in patterns:
txt = re.sub(i, i + "^", txt)
print(txt)
That way, you actually modify the text incrementally instead of performing each substitution on the original text and discarding the result:
this is a red house
thi^s is a red house
thi^s is a r^ed house
thi^s is a r^ed house^
Since you're not really using a regular expression in re.sub()
, it'd be easier to just use str.replace
:
for pattern in patterns:
txt = txt.replace(pattern, pattern + '^')
If you actually want to use regular expressions, you'd have to do something like this:
patterns_regex = '(' + '|'.join(patterns) + ')' # ['a', 'b'] -> '(a|b)'
print(re.sub(patterns_regex, r'\1^', txt)
Upvotes: 2
Reputation: 36691
You are not saving the substitutions at each iteration of your for
loop. Try reassigning the substituted value back to txt
.
import re
txt = "this is a red house"
patterns = ["thi", "a r", "use"]
for i in patterns:
txt = re.sub(i, i + "^", txt)
print(txt)
# prints:
thi^s is a r^ed house^
Upvotes: 1
Reputation: 21274
Join your patterns together with |
, then use a function as the replacement argument of re.sub()
:
regex = re.compile("|".join([f"({p})" for p in patterns]))
regex.sub(lambda m: m.string[m.start():m.end()]+"^", txt)
# 'thi^s is a r^ed house^'
Note: If you don't want to use re.compile()
, you can do it all in one line with:
re.sub("|".join([f"({p})" for p in patterns]),
lambda m: m.string[m.start():m.end()]+"^",
txt)
Upvotes: 1