Reputation: 73
I would like to know if a RegEX engine, before to try to match a regex, checks if the data has a minimum length that the regexp requires. For example the regex "a{1000}" in a data composed of 999 "a", fails. The result can be obtained avoiding to apply the regex, and only performing some checks to the length of the data (and the minimum of the regex). So, generically, a RegEX engine performs this kind of tests? In particular I'm interested to know if the re module of Python does this.
Upvotes: 2
Views: 367
Reputation: 3203
In particular I'm interested to know if the re module of Python does this.
A measurement suggests that it does.
import re
import timeit
def test(charsInString, charsInRegex):
regex = re.compile('a{'+str(charsInRegex)+'}')
string = 'a'*charsInString;
for i in range(1, 200000):
regex.match(string)
print(timeit.timeit("test(1, 1)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(1, 2)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(1, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(4999, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(5000, 5000)", setup="from __main__ import test", number=1))
print(timeit.timeit("test(50000, 5000)", setup="from __main__ import test", number=1))
Output:
0.9117504503834146
0.8135033788142646
0.819454105947109
0.8154557798237785
15.441637204298287
15.412751909222905
And a more complex one:
import re
import timeit
def test2(charsInString):
regex = re.compile('((ab{3,5}c+){5000,6000}d)+e*f')
string = 'abbbbcc'*charsInString;
for i in range(1, 100000):
regex.match(string)
print(timeit.timeit("test2(1)", setup="from __main__ import test2", number=1))
print(timeit.timeit("test2(3571)", setup="from __main__ import test2", number=1))
print(timeit.timeit("test2(3572)", setup="from __main__ import test2", number=1))
Output:
0.04918821760123643
0.04305112491748375
60.76094317352544
Upvotes: 1