Speed up a series of regex replacement in python

Question

My python script would read each line in file and do many regex replacements in each line.

If the regex success, skip to the next line

Is there any way to speed up this kind of script?
Is it worth to call subn instead and check if replacement done and then skip to the remain one?
If I compile the regex, is it possible to store all the compiled regex in memory?

for file in files:  
     for line in file:  
         re.sub() # <--- ~ 100 re.sub

PS: the replacement vaires for each regex

Tim Pietzcker · Accepted Answer

You should probably do three things:

Reduce the number of regexes. Depending on differences in the substitution part, you might be able to combine them all into a single one. Using careful alternation, you can determine the sequence in which parts of the regex will be matched.
If possible (depending on file size), read the file into memory completely.
Compile your regex (only for readability; it won't matter in terms of speed as long as the number of regexes stays below 100).

This gives you something like:

regex = re.compile(r"My big honking regex")
for datafile in files:
    content = datafile.read()
    result = regex.sub("Replacement", content)

Speed up a series of regex replacement in python

Answers (2)

Related Questions