Amon
Amon

Reputation: 2951

Using recursion to indentify valid regexes

I've been developing a program that has a function that takes a string like so: ((1.(0.2)2).0) and returns True if it is a regex. Here's mmy code so far:

def is_regex(s):
    #Takes a string 's' and produces True if it is a valid regular expression
    #But False otherwise
    ex = '( + 3* + | + 6* + )'
    leaves = ['1', '2', '0', '1*', '2*', '0*']
    internal_nodes = ['|', '.']
    stripped_string = s.strip('()')

    if len(s.strip('()')) == 1:
        if '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()')  or 'e' in s.strip('()'):
        return True
    elif len(s.strip('()')) == 0:
        return True
    elif stripped_string in leaves[3:6]:
        return True 
    elif len(stripped_string) == 3:
        if stripped_string[0] in leaves and stripped_string[2] in leaves:
            if stripped_string[1] in internal_nodes:
                return True 
    elif '.' in s and len(stripped_string) > 3:
        dot_position = s.rfind('.')
        if s.rfind('.') > s.rfind(')', 0, len(s)-1): #if '.' is only surrounded by one set of () then it is the root of the tree
            is_regex(s[dot_position +1])

The idea here is I want to find the root of the tree, and check if it's two children are valid regexes, if they are I move on a recurse until I'm at the leaf, if it passes all the tests until it hits the leaf then the entire regex is True.

For the last line I have there is_regex(s[dot_position +1]) I'm not getting anything returned, even though I know s[dot_position +1] returns 0 so there should be no problem. In this line I'm checking the right child of . which is 0.

Edit: Another question: I need to test whether or not the left child is true as well. How would I do this? Wouldn't I need to pass both to is_regex? Or should I check if both right and left are true then proceed? This is confusing

Upvotes: 0

Views: 132

Answers (1)

lvc
lvc

Reputation: 35059

This is probably the most common recursion bug in existence. If you don't return the result of the recursive call, Python does not return it for you - it keeps running the function as normal. Since there's nothing else after it, it falls off the end without returning - and Python's rules mean that it will then implicitly return None. The result of is_regex(s[dot_position + 1]) is ignored, unless you explicitly return it (or otherwise use it somehow).


You have a similar bug in two of the other paths through this function:

if len(s.strip('()')) == 1:
    if '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()')  or 'e' in s.strip('()'):
    return True

elif len(stripped_string) == 3:
    if stripped_string[0] in leaves and stripped_string[2] in leaves:
        if stripped_string[1] in internal_nodes:
            return True 

in both of these cases, if the outer if triggers but the inner ifs fail, you will end up returning None. This isn't as serious an issue, because None will still test false when calling code does if is_regex(not_a_regex) - but for the sake of consistency and explicitly handling these cases (instead of essentially forgetting them and having it happen to work), you probably want to return False. The easiest way to do that is to just return the boolean expressions instead of testing them:

if len(s.strip('()')) == 1:
    return '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()')  or 'e' in s.strip('()')

elif len(stripped_string) == 3:
    return stripped_string[0] in leaves and stripped_string[2] in leaves and stripped_string[1] in internal_nodes

For testing the left child as well, you would indeed have to recurse into both - separately, mind (passing them into the same recursive call might be doable, but is unlikely to work). I would do it this way:

dot_position = s.rfind('.')
if s.rfind('.') > s.rfind(')', 0, len(s)-1): #if '.' is only surrounded by one set of () then it is the root of the tree
    left_child =  s[dot_position - 1]
    right child = s[dot_position + 1]
    return is_regex(left_child) and is_regex(right_child)

This is a very common pattern in any algorithm involving tree structures: test something about the value in the current node, and then test all of its subtrees by recursively calling the same routine on each child in turn; return a result that is dependent on the results of all the recursive calls. Wikipedia calls this a 'depth-first pre-order walk'.

Upvotes: 1

Related Questions