Reputation: 311
I'm trying to create a simple script which will take the regular expressions from a file, and then carry out the searches and replacements on another file. This is what I have but it doesn't work, the file is unchanged, what am I doing wrong?
import re, fileinput
separator = ' => '
file = open("searches.txt", "r")
for search in file:
pattern, replacement = search.split(separator)
pattern = 'r"""' + pattern + '"""'
replacement = 'r"""' + replacement + '"""'
for line in fileinput.input("test.txt", inplace=1):
line = re.sub(pattern, replacement, line)
print(line, end="")
The file searches.txt looks like this:
<p (class="test">.+?)</p> => <h1 \1</h1>
(<p class="not">).+?(</p>) => \1This was changed by the script\2
and test.txt like this:
<p class="test">This is an element with the test class</p>
<p class="not">This is an element without the test class</p>
<p class="test">This is another element with the test class</p>
I did a test to see if it's getting the expression from the file correctly:
>>> separator = ' => '
>>> file = open("searches.txt", "r")
>>> for search in file:
... pattern, replacement = search.split(separator)
... pattern = 'r"""' + pattern + '"""'
... replacement = 'r"""' + replacement + '"""'
... print(pattern)
... print(replacement)
...
r"""<p (class="test">.+?)</p>"""
r"""<h1 \1</h1>
"""
r"""(<p class="not">).+?(</p>)"""
r"""\1This was changed by the script\2"""
The closing triple quotes on the first replacement are on a newline for some reason, could this be the cause of my problem?
Upvotes: 0
Views: 648
Reputation:
Two observations:
1) Use .strip()
when reading the file like so:
pattern, replacement = search.strip().split(separator)
This will remove the \n
from the file
2) Use re.escape() rather than the r"""+ str +""" form you are using if you intend to escape regex meta characters from the pattern
Upvotes: 1
Reputation: 2790
You don't need
pattern = 'r"""' + pattern + '"""'
In the call to re.sub, pattern
should be the actual regex. So <p (class="test">.+?)</p>
. When you wrap all those double quotes around it, it makes it so that the pattern never matches the text in your file.
Even though you seem to have seen code like this:
replaced = re.sub(r"""\w+""", '-')
In that case, the r"""
indicates to the python interpreter that you're talking about a "raw" multiline string, or a string that should not have backslash sequences replaced (such as \n replaced with newline). Programmers often use "raw" strings in python to quote regex because they want to use regex sequences (like \w
above) without having to quote the backslash. Without a raw string, the regex would have to be '\\w+'
, which gets confusing.
However in any case, you don't need the triple double quotes at all. The last code phrase could simply have been written:
replaced = re.sub(r'\w+', '-')
Finally, your other problem is that your input file has newlines in it, separating each case of pattern => replacement. So really it's "pattern => replacement\n" and the trailing newline follows your replacement variable. Try doing:
for search in file:
search = search.rstrip() #Remove the trailing \n from the input
pattern, replacement = search.split(separator)
Upvotes: 3