Reputation: 1680
In Python 2.7 and 3, the following works:
>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>
but this gives an error:
>>> re.search(r"a{1,99999}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/usr/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
RuntimeError: invalid SRE code
It seems like there is an upper limit on the number of repetitions allowed. Is this part of the regular expression specification, or a Python-specific limitation? If Python-specific, is the actual number documented somewhere, and does it vary between implementations?
Upvotes: 12
Views: 864
Reputation: 129507
A quick manual binary search revealed the answer, specifically 65535:
>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>>
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
p = sre_compile.compile(pattern, flags)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
This is discussed here:
The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.
and
The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".
Thanks to the authors of the comments below for pointing a few more things out:
_sre.c
. (@LukasGraf)There is a constant MAXREPEAT
in sre_constants.py
that holds this max repetition value:
>>> import sre_constants
>>>
>>> sre_constants.MAXREPEAT
65535
(@MarkkuK. and @hcwhsa)
Upvotes: 14