Legend
Legend

Reputation: 116950

How do I improve my parsing technique?

I am writing a pythonic parser for a custom language and as of now I have something like this:

re1 = re.compile(r"...")
re2 = re.compile(r"...")
re3 = re.compile(r"...")
re4 = re.compile(r"...")
...
...

Now I am reading the input file and for each line if I find a specific keyword then I am using a particular regular expression. Obviously, this is making my life a living hell because I am doing something like this:

if line.find("keyword1") >= 0
  # Uses re1 to match the string
  invoke_handler1() 
elif line.find('keyword2") >= 0
  # Uses re2 to match the string
  invoke_handler2() 
...

At the same time, I do not want to match a given line with all possible regular expressions because that would be a waste. Without discarding everything that I wrote up until this point, is there an elegant way of solving this problem and make it more efficient and readable?

Upvotes: 0

Views: 153

Answers (3)

Chris Nava
Chris Nava

Reputation: 6802

You may want to create a data structure the maps keywords to REs. But honestly, I would try making REs that fail fast as a first priority and just loop over them all.

An example of a fail fast RE would be one that starts with "^Sometext" as if the first character doesn't match "S" then the rest of the RE is not evaluated.

Upvotes: 1

Chris Phillips
Chris Phillips

Reputation: 12387

I don't think this is the answer you're looking for, but I think you'd be having a better time using an actual Lexer and Tokenizer for parsing your language. I suggest looking at and learning to use PLY for this kind of task.

Upvotes: 2

Mark Byers
Mark Byers

Reputation: 838716

Rather than rolling your own parser using you could have a look at one of the many parser libraries available for Python.

Upvotes: 3

Related Questions