Reputation: 116950
I am writing a pythonic parser for a custom language and as of now I have something like this:
re1 = re.compile(r"...")
re2 = re.compile(r"...")
re3 = re.compile(r"...")
re4 = re.compile(r"...")
...
...
Now I am reading the input file and for each line if I find a specific keyword then I am using a particular regular expression. Obviously, this is making my life a living hell because I am doing something like this:
if line.find("keyword1") >= 0
# Uses re1 to match the string
invoke_handler1()
elif line.find('keyword2") >= 0
# Uses re2 to match the string
invoke_handler2()
...
At the same time, I do not want to match a given line with all possible regular expressions because that would be a waste. Without discarding everything that I wrote up until this point, is there an elegant way of solving this problem and make it more efficient and readable?
Upvotes: 0
Views: 153
Reputation: 6802
You may want to create a data structure the maps keywords to REs. But honestly, I would try making REs that fail fast as a first priority and just loop over them all.
An example of a fail fast RE would be one that starts with "^Sometext" as if the first character doesn't match "S" then the rest of the RE is not evaluated.
Upvotes: 1
Reputation: 12387
I don't think this is the answer you're looking for, but I think you'd be having a better time using an actual Lexer and Tokenizer for parsing your language. I suggest looking at and learning to use PLY for this kind of task.
Upvotes: 2
Reputation: 838716
Rather than rolling your own parser using you could have a look at one of the many parser libraries available for Python.
Upvotes: 3