fedexist
fedexist

Reputation: 175

Parsing regular expressions based on a context free grammar

Good evening, Stack Overflow. I'd like to develop an interpreter for expressions based on a pretty simple context-free grammar:

Grammar

Basically, the language is constituted by 2 base statements

( SET var 25 ) // Output: var = 25
( GET ( MUL var 5 ) ) // Output: 125
( SET var2 ( MUL 30 5 ) ) //Output: var2 = 150

Now, I'm pretty sure about what should I do in order to interpret a statement: 1) Lexical analysis to turn a statement into a sequence of tokens 2) Syntax analysis to get a symbol table (HashMap with the variables and their values) and a syntactic tree (to perform the GET statements) to 3) perform an inorder visit of the tree to get the results I want.

I'd like some advice on the parsing method to read the source file. Considering the parser should ignore any whitespace, tabulation or newline, is it possible to use a Java Pattern to get a general statement I want to analyze? Is there a good way to read a statement weirdly formatted (and possibly more complex) like this

(
  SET var

 25
 )

without confusing the parser with the open and closed parenthesises?

For example

Scanner scan; //scanner reading the source file
String pattern = "..." //ideal pattern I've found to represent an expression
while(scan.hasNext(pattern))
  Interpreter.computeStatement(scan.next(pattern));

would it be a viable option for this problem?

Upvotes: 3

Views: 1271

Answers (2)

fedexist
fedexist

Reputation: 175

In the end, I understood thanks to Ira Baxter that this context free grammar can't be parsed with RegExp and I used the concepts of S-Expressions to build up the interpreter, whose source code you can find here. If you have any question about it (mainly because the comments aren't translated in english, even though I think the code is pretty clear), just message me or comment here.

Basically what I do is:

  • Parse every character and tokenize it (e.g '(' -> is OPEN_PAR, while "SET" -> STATEMENT_SET or a random letter like 'b' is parsed as a VARIABLE )
  • Then, I use the token list created to do a syntactic analysis, which checks the patterns occuring inside the token list, according to the grammar
  • If there's an expression inside the statement, I check recursively for any expression inside an expression, throwing an exception and going to the following correct statement if needed
  • At the end of analysing every single statement, I compute the statement as necessary as for specifications

Upvotes: 1

Stephan
Stephan

Reputation: 43083

Solution proposed by Ira Braxter:

Your title is extremely confused. You appear to want to parse what are commonly called "S-expressions" in the LISP world; this takes a (simple but) context-free grammar. You cannot parse such expressions with regexps. Time to learn about real parsers.


Maybe this will help: stackoverflow.com/a/2336769/120163

Upvotes: 1

Related Questions