mojones
mojones

Reputation: 1680

What's the maximum number of repetitions allowed in a Python regex?

In Python 2.7 and 3, the following works:

>>> re.search(r"a{1,9999}", 'aaa')
<_sre.SRE_Match object at 0x1f5d100>

but this gives an error:

>>> re.search(r"a{1,99999}", 'aaa')
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/lib/python2.7/re.py", line 142, in search
   return _compile(pattern, flags).search(string)
   File "/usr/lib/python2.7/re.py", line 240, in _compile
   p = sre_compile.compile(pattern, flags)
   File "/usr/lib/python2.7/sre_compile.py", line 523, in compile
   groupindex, indexgroup
RuntimeError: invalid SRE code

It seems like there is an upper limit on the number of repetitions allowed. Is this part of the regular expression specification, or a Python-specific limitation? If Python-specific, is the actual number documented somewhere, and does it vary between implementations?

Upvotes: 12

Views: 864

Answers (1)

arshajii
arshajii

Reputation: 129507

A quick manual binary search revealed the answer, specifically 65535:

>>> re.search(r"a{1,65535}", 'aaa')
<_sre.SRE_Match object at 0x2a9a68>
>>> 
>>> re.search(r"a{1,65536}", 'aaa')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile
    groupindex, indexgroup
OverflowError: regular expression code size limit exceeded

This is discussed here:

The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.

and

The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".


Thanks to the authors of the comments below for pointing a few more things out:

  • CPython implements this limitation in _sre.c. (@LukasGraf)
  • There is a constant MAXREPEAT in sre_constants.py that holds this max repetition value:

    >>> import sre_constants
    >>> 
    >>> sre_constants.MAXREPEAT
    65535
    

    (@MarkkuK. and @hcwhsa)

Upvotes: 14

Related Questions