Regular expressions related, switch-statement emulation in python

Question

Desired outcome:

I would like to have a parser function, which takes a string of "instructions".

This string will be chopped up using a string.split(";") and stripped of whitespace. I want to check each "chop" for a match against a bunch (10+) of regular expressions. Each expression also has capture groups defined values from which I would later use to "execute the command".

The problem:

I currently have a long and complex if, elseif, else statement, which is very undesirable, because it makes my code harder to manage and harder for others to read.

Idea so far:

Basically, I would like to use dictionaries to emulate a switch statement. But I have very little experience with regular expressions, I was able to make the correct "expressions" to capture what I want in the "instructions". But I am very unfamiliar with the workings of the pythons regular expression package.

A step in the right direction would already be, a function, where given a single string, and a list, or dict of regular expressions, the function would return which of the reg-ex was matched.

Example Code: (excuse the indents:) )

class PreparedConstraintsCollection(ConstraintsCollectionABC):

not_pattern = re.compile("^not([-+]*[0-9]+)$")
ex_pattern = re.compile("^ex([-+]*[0-9]+)$")
more_pattern = re.compile("^>([-+]*[0-9]+)$")
less_pattern = re.compile("^<([-+]*[0-9]+)$")
interval_pattern = re.compile("^([-+]*[0-9]+) set:
    """
    The overly-complex function to parse the restriction control sequence strings

    Control Sequence        Meaning                     Explanation
    ---------------------------------------------------------------------------------------------
    +                       Positive only               Allow only positive values
    -                       Negative only               Allow only negative values
    notX                    Not X value                 Do not allow values X
    exX                     Must be X                   Only allow values X
    >X                      More then X                 Values must be more then X

Ajax1234 · Accepted Answer

A possibility for your parser is to write a tokenizer that will create a nested list of all statements and the type found:

The first step is to create your grammar and tokenize your input string:

import re
import collections
token = collections.namedtuple('token', ['type', 'value'])
grammar = r'\+|\-|\bnot\b|\bex\b|\>|\<|[a-zA-Z0-9_]+'
tokens = {'plus':'\+', 'minus':'\-', 'not':r'\bnot\b', 'ex':r'\bex\b', 'lt':'\<', 'gt':'\>', 'var':'[a-zA-Z0-9_]+'}
sample_input = 'val1+val23; val1 < val3 < new_variable; ex val3;not secondvar;'
tokenized_grammar = [token([a for a, b in tokens.items() if re.findall(b, i)][0], i) for i in re.findall(grammar, sample_input)]

Now, tokenized_grammar stores a list of all tokenized grammar occurences in the text:

[token(type='var', value='val1'), token(type='plus', value='+'), token(type='var', value='val23'), token(type='var', value='val1'), token(type='lt', value='<'), token(type='var', value='val3'), token(type='lt', value='<'), token(type='var', value='new_variable'), token(type='var', value='ex'), token(type='var', value='val3'), token(type='var', value='not'), token(type='var', value='secondvar')]

Token types and values can be accessed as objects:

full_types = [(i.type, i.value) for i in tokenized_grammar]

Output:

[('var', 'val1'), ('plus', '+'), ('var', 'val23'), ('var', 'val1'), ('lt', '<'), ('var', 'val3'), ('lt', '<'), ('var', 'new_variable'), ('var', 'ex'), ('var', 'val3'), ('var', 'not'), ('var', 'secondvar')]

To implement the flow of a switch-case statement, you can create a dictionary, with each key being the type of a token, and the value being a class to store the corresponding value and methods to be added later:

class Plus:
   def __init__(self, storing):
     self.storing = storing
   def __repr__(self):
     return "{}({})".format(self.__class__.__name__, self.storing)
class Minus:
   def __init__(self, storing):
     self.storing = storing
   def __repr__(self):
     return "{}({})".format(self.__class__.__name__, self.storing)
...

Then, to create the dictionary:

tokens_objects = {'plus':Plus, 'minus':Minus, 'not':Not, 'ex':Ex, 'lt':Lt, 'gt':Lt, 'var':Variable}

Then, you can iterate over tokenized_grammar, and create a class object for each occurence:

for t in tokenized_grammar:
   t_obj = token_objects[t.type](t.value)

Regular expressions related, switch-statement emulation in python

Answers (1)

Related Questions