user3412839
user3412839

Reputation: 576

Valid regular expression

In order to check to see if a string is a valid regular expression I use the following piece of code:

import re

try:
    re.compile('(0*|2*)')
    is_valid = True
    print(is_valid)
except re.error:
    is_valid = False
    print(is_valid)

My question is how does re.compile('(0*|2*)') check to see if the string passed in is a valid regular expression. In other words, what is it doing behind the scenes to the string to check if it is valid or not. What I thought it was doing it maybe turning the string to a list, in example the string '(0*|2*)' as a list would be:

['(', '0', '*', '|', '2', '*', ')']

and then check to see if the first and last items are valid when combined and if it is move on to the second item and second last item and repeat the process, however this is not the case because it would return false at *2.

If anyone can explain the algorithm/how it checks to see if a string passed in is a valid regular expression in terms of pseudocode at least that would be really appreciated.

Upvotes: 2

Views: 87

Answers (1)

Floris
Floris

Reputation: 46365

The re.compile command parses the string. You can see what it is doing with the re.DEBUG flag:

re.compile('(0*|2*)',re.DEBUG)
subpattern 1
  branch
    max_repeat 0 65535
      literal 48
  or
    max_repeat 0 65535
      literal 50
<_sre.SRE_Pattern object at 0x101b6b780>

When your expression has an error, you will see exactly what and how:

re.compile('(0*|2*\)',re.DEBUG)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis

Upvotes: 1

Related Questions