Reputation: 5152
My python script would read each line in file and do many regex replacements in each line.
If the regex success, skip to the next line
Is there any way to speed up this kind of script?
Is it worth to call subn instead and check if replacement done and then skip to the remain one?
If I compile the regex, is it possible to store all the compiled regex in memory?
for file in files:
for line in file:
re.sub() # <--- ~ 100 re.sub
PS: the replacement vaires for each regex
Upvotes: 0
Views: 682
Reputation: 20644
As @Tim Pietzcker said, you could reduce the number of regexes by making them alternatives. You can determine which alternative matched by the using the 'lastindex' attribute of the match object.
Here's an example of what you could do:
>>> import re
>>> replacements = {1: "<UPPERCASE LETTERS>", 2: "<lowercase letters>", 3: "<Digits>"}
>>> def replace(m):
... return replacements[m.lastindex]
...
>>> re.sub(r"([A-Z]+)|([a-z]+)|([0-9]+)", replace, "ABC def 789")
'<UPPERCASE LETTERS> <lowercase letters> <Digits>'
Upvotes: 2
Reputation: 336078
You should probably do three things:
This gives you something like:
regex = re.compile(r"My big honking regex")
for datafile in files:
content = datafile.read()
result = regex.sub("Replacement", content)
Upvotes: 2