Reputation: 3515
I've been writing an lexer/parser/interpreter for my own language and so far all has been working. I've been following the examples over at Ruslan Spivak's blog (Github link to each article).
I wanted to extend my language grammar past what is written in the articles to include more operators like comparisons (<
, >=
, etc.) and also exponents (**
or ^
in my language). I have this grammar:
expression : exponent ((ADD | SUB) exponent)*
exponent : term ((POWER) term)*
# this one is right-associative (powers **)
term : comparison ((MUL | DIV) comparison)*
comparison : factor ((EQUAl | L_EQUAL | LESS
N_EQUAL | G_EQUAL | GREATER) factor)*
# these are all binary operations
factor : NUM | STR | variable
| ADD factor | SUB factor
| LPAREN expr RPAREN
# different types of 'base' types like integers
# also contains parenthesised expressions which are evalutaed first
In terms of parsing tokens, I use the same method as used in Ruslan's blog. Here is one that will parse the exponent
line, which handles addition and subtraction despite its name, as the grammar says that expressions are parsed as
exponent_expr (+ / -) exponent_expr
def exponent(self):
node = self.term()
while self.current_token.type in (ADD, SUB):
token = self.current_token
if token.type == ADD:
self.consume_token(ADD)
elif token.type == SUB:
self.consume_token(SUB)
node = BinaryOperation(left_node=node,
operator=token,
right_node=self.term())
return node
Now this parses left-associative tokens just fine (since the token stream comes left to right naturally), but I am stuck on how to parse right-associative exponents. Look at this expected in/out for reference:
>>> 2 ** 3 ** 2
# should be parsed as...
>>> 2 ** (3 ** 2)
# which is...
>>> 2 ** 9
# which returns...
512
# Mine, at the moment, parses it as...
>>> (2 ** 3) ** 2
# which is...
>>> 8 ** 2
# which returns...
64
To solve this, I tried switching the BinaryOperation()
constructor's left and right nodes to make the current node the right and the new node the left, but this just makes 2**5
parse as 5**2
which gives me 25
instead of the expected 32
.
Any approaches that I could try?
Upvotes: 2
Views: 1694
Reputation: 241901
The fact that your exponent
function actually parses expression
s should have been a red flag. In fact, what you need is an expression
function which parses expressions and an exponent
function which parses exponentiations.
You've also mixed up the precedences of exponentiation and multiplication (and other operations), because 2 * x ** 4
does not mean (2 * x) ** 4
(which would be 16x⁴
), but rather 2 * (x ** 4)
. By the same token, x * 3 < 17
does not mean x * (3 < 17)
, which is how your grammar will parse it.
Normally the precedences for arithmetics look something like this:
comparison <, <=, ==, ... ( lowest precedence)
additive +, -
multiplicative *, /, %
unary +, -
exponentiation **
atoms numbers, variables, parenthesized expressions, etc.
(If you had postfix operators like function calls, they would go in between exponentiation and atoms.)
Once you've reworked your grammar in this form, the exponent parser will look something like this:
def exponent(self):
node = self.term()
while self.current_token.type is POWER:
self.consume_token(ADD)
node = BinaryOperation(left_node=node,
operator=token,
right_node=self.exponent())
return node
The recursive call at the end produces right associativity. In this case recursion is acceptable because the left operand and the operator have already been consumed. Thus the recursive call cannot produce an infinite loop.
Upvotes: 3