Reputation: 2951
I've been developing a program that has a function that takes a string like so: ((1.(0.2)2).0)
and returns True
if it is a regex. Here's mmy code so far:
def is_regex(s):
#Takes a string 's' and produces True if it is a valid regular expression
#But False otherwise
ex = '( + 3* + | + 6* + )'
leaves = ['1', '2', '0', '1*', '2*', '0*']
internal_nodes = ['|', '.']
stripped_string = s.strip('()')
if len(s.strip('()')) == 1:
if '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()') or 'e' in s.strip('()'):
return True
elif len(s.strip('()')) == 0:
return True
elif stripped_string in leaves[3:6]:
return True
elif len(stripped_string) == 3:
if stripped_string[0] in leaves and stripped_string[2] in leaves:
if stripped_string[1] in internal_nodes:
return True
elif '.' in s and len(stripped_string) > 3:
dot_position = s.rfind('.')
if s.rfind('.') > s.rfind(')', 0, len(s)-1): #if '.' is only surrounded by one set of () then it is the root of the tree
is_regex(s[dot_position +1])
The idea here is I want to find the root of the tree, and check if it's two children are valid regexes, if they are I move on a recurse until I'm at the leaf, if it passes all the tests until it hits the leaf then the entire regex is True
.
For the last line I have there is_regex(s[dot_position +1])
I'm not getting anything returned, even though I know s[dot_position +1]
returns 0
so there should be no problem. In this line I'm checking the right child of .
which is 0
.
Edit: Another question: I need to test whether or not the left child is true as well. How would I do this? Wouldn't I need to pass both to is_regex
? Or should I check if both right and left are true then proceed? This is confusing
Upvotes: 0
Views: 132
Reputation: 35059
This is probably the most common recursion bug in existence. If you don't return
the result of the recursive call, Python does not return it for you - it keeps running the function as normal. Since there's nothing else after it, it falls off the end without returning - and Python's rules mean that it will then implicitly return None
. The result of is_regex(s[dot_position + 1])
is ignored, unless you explicitly return it (or otherwise use it somehow).
You have a similar bug in two of the other paths through this function:
if len(s.strip('()')) == 1:
if '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()') or 'e' in s.strip('()'):
return True
elif len(stripped_string) == 3:
if stripped_string[0] in leaves and stripped_string[2] in leaves:
if stripped_string[1] in internal_nodes:
return True
in both of these cases, if the outer if triggers but the inner ifs fail, you will end up returning None. This isn't as serious an issue, because None
will still test false when calling code does if is_regex(not_a_regex)
- but for the sake of consistency and explicitly handling these cases (instead of essentially forgetting them and having it happen to work), you probably want to return False
. The easiest way to do that is to just return the boolean expressions instead of testing them:
if len(s.strip('()')) == 1:
return '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()') or 'e' in s.strip('()')
elif len(stripped_string) == 3:
return stripped_string[0] in leaves and stripped_string[2] in leaves and stripped_string[1] in internal_nodes
For testing the left child as well, you would indeed have to recurse into both - separately, mind (passing them into the same recursive call might be doable, but is unlikely to work). I would do it this way:
dot_position = s.rfind('.')
if s.rfind('.') > s.rfind(')', 0, len(s)-1): #if '.' is only surrounded by one set of () then it is the root of the tree
left_child = s[dot_position - 1]
right child = s[dot_position + 1]
return is_regex(left_child) and is_regex(right_child)
This is a very common pattern in any algorithm involving tree structures: test something about the value in the current node, and then test all of its subtrees by recursively calling the same routine on each child in turn; return a result that is dependent on the results of all the recursive calls. Wikipedia calls this a 'depth-first pre-order walk'.
Upvotes: 1