Multiple Regex Search and Replace

Question

I'm trying to create a simple script which will take the regular expressions from a file, and then carry out the searches and replacements on another file. This is what I have but it doesn't work, the file is unchanged, what am I doing wrong?

import re, fileinput

separator = ' => '

file = open("searches.txt", "r")

for search in file:
    pattern, replacement = search.split(separator)
    pattern = 'r"""' + pattern + '"""'
    replacement = 'r"""' + replacement + '"""'
    for line in fileinput.input("test.txt", inplace=1):
        line = re.sub(pattern, replacement, line)
        print(line, end="")

The file searches.txt looks like this:

.+?)
 => 
().+?() => \1This was changed by the script\2

and test.txt like this:

This is an element with the test class
This is an element without the test class
This is another element with the test class

I did a test to see if it's getting the expression from the file correctly:

>>> separator = ' => '
>>> file = open("searches.txt", "r")
>>> for search in file:
...     pattern, replacement = search.split(separator)
...     pattern = 'r"""' + pattern + '"""'
...     replacement = 'r"""' + replacement + '"""'
...     print(pattern)
...     print(replacement)
... 
r""".+?)"""
r"""
"""
r"""().+?()"""
r"""\1This was changed by the script\2"""

The closing triple quotes on the first replacement are on a newline for some reason, could this be the cause of my problem?

audiodude · Accepted Answer

You don't need

pattern = 'r"""' + pattern + '"""'

In the call to re.sub, pattern should be the actual regex. So

.+?)

. When you wrap all those double quotes around it, it makes it so that the pattern never matches the text in your file.

Even though you seem to have seen code like this:

replaced = re.sub(r"""\w+""", '-')

In that case, the r""" indicates to the python interpreter that you're talking about a "raw" multiline string, or a string that should not have backslash sequences replaced (such as replaced with newline). Programmers often use "raw" strings in python to quote regex because they want to use regex sequences (like \w above) without having to quote the backslash. Without a raw string, the regex would have to be '\w+', which gets confusing.

However in any case, you don't need the triple double quotes at all. The last code phrase could simply have been written:

replaced = re.sub(r'\w+', '-')

Finally, your other problem is that your input file has newlines in it, separating each case of pattern => replacement. So really it's "pattern => replacement " and the trailing newline follows your replacement variable. Try doing:

for search in file:
    search = search.rstrip() #Remove the trailing 
 from the input
    pattern, replacement = search.split(separator)

Multiple Regex Search and Replace

Answers (2)

Related Questions