Россарх
Россарх

Reputation: 143

Python: string & list re.sub comparison

The string

txt = "this is a red house"

has come into existence. And then there is a list

patterns = ["thi", "a r", "use"]

with some matches for it.

The plan was to use a = re.sub("".join(patterns), "".join(patterns) + "^", txt), which i hoped would return as thi^s is a r^ed house^. Not so much – it just prints the string again. If re.search is used instead, it actually returns None, so the reason is that re.sub doesn’t find anything, simply returning the string.

I was ready to pull the plug on this, thinking that re.sub just can’t be used the way i thought it could, and then i accidentally tried it inside a simple loop:

for i in patterns:
    a = re.sub(i, i + "^", txt)
    print(a)

And suddenly it (almost) worked: thi^s is a red house [\n] this is a r^ed house [\n] this is a red house^. Now i can’t just let it go. What is going on?

Upvotes: 0

Views: 3173

Answers (4)

Thunder
Thunder

Reputation: 31

This gives you the result you're looking for:

txt = "this is a red house"
patterns = ["thi", "a r", "use"]

for s in patterns:
    txt = re.sub(s,s+'^',txt)
print(txt)

First, your print statement is inside the loop, hence the duplicate strings.

Second, your re.sub(...) is returning the changes to 'txt' for each pass through the loop. If you wish to accumulate the changes you need to assign the results back to 'txt'. Otherwise you will only see the latest substitution assigned to 'a'.

Third, "".join(patterns) results in a string "thia ruse" which will not match any part of 'txt'.

I hope this helps. Regular expressions are discipline by themselves. I've been using them since the 80's and still need to check the Docs. Keep going!

Upvotes: 2

Blender
Blender

Reputation: 298374

Your loop can be fixed completely if you replace a with txt:

for i in patterns:
    txt = re.sub(i, i + "^", txt)
    print(txt)

That way, you actually modify the text incrementally instead of performing each substitution on the original text and discarding the result:

this is a red house
thi^s is a red house
thi^s is a r^ed house
thi^s is a r^ed house^

Since you're not really using a regular expression in re.sub(), it'd be easier to just use str.replace:

for pattern in patterns:
    txt = txt.replace(pattern, pattern + '^')

If you actually want to use regular expressions, you'd have to do something like this:

patterns_regex = '(' + '|'.join(patterns) + ')'  # ['a', 'b'] -> '(a|b)'
print(re.sub(patterns_regex, r'\1^', txt)

Upvotes: 2

James
James

Reputation: 36691

You are not saving the substitutions at each iteration of your for loop. Try reassigning the substituted value back to txt.

import re

txt = "this is a red house"
patterns = ["thi", "a r", "use"]

for i in patterns:
    txt = re.sub(i, i + "^", txt)
print(txt)
# prints:
thi^s is a r^ed house^

Upvotes: 1

andrew_reece
andrew_reece

Reputation: 21274

Join your patterns together with |, then use a function as the replacement argument of re.sub():

regex = re.compile("|".join([f"({p})" for p in patterns]))
regex.sub(lambda m: m.string[m.start():m.end()]+"^", txt)

# 'thi^s is a r^ed house^'

Note: If you don't want to use re.compile(), you can do it all in one line with:

re.sub("|".join([f"({p})" for p in patterns]), 
       lambda m: m.string[m.start():m.end()]+"^", 
       txt)

Upvotes: 1

Related Questions