Reputation: 576
In order to check to see if a string is a valid regular expression I use the following piece of code:
import re
try:
re.compile('(0*|2*)')
is_valid = True
print(is_valid)
except re.error:
is_valid = False
print(is_valid)
My question is how does re.compile('(0*|2*)')
check to see if the string passed in is a valid regular expression. In other words, what is it doing behind the scenes to the string to check if it is valid or not. What I thought it was doing it maybe turning the string to a list, in example the string '(0*|2*)' as a list would be:
['(', '0', '*', '|', '2', '*', ')']
and then check to see if the first and last items are valid when combined and if it is move on to the second item and second last item and repeat the process, however this is not the case because it would return false at *2.
If anyone can explain the algorithm/how it checks to see if a string passed in is a valid regular expression in terms of pseudocode at least that would be really appreciated.
Upvotes: 2
Views: 87
Reputation: 46365
The re.compile
command parses the string. You can see what it is doing with the re.DEBUG
flag:
re.compile('(0*|2*)',re.DEBUG)
subpattern 1
branch
max_repeat 0 65535
literal 48
or
max_repeat 0 65535
literal 50
<_sre.SRE_Pattern object at 0x101b6b780>
When your expression has an error, you will see exactly what and how:
re.compile('(0*|2*\)',re.DEBUG)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 190, in compile
return _compile(pattern, flags)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 244, in _compile
raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis
Upvotes: 1