Reputation: 33
if statements and while statement keep throwing syntax errors from p_error(p), and PLY tells me that there's a conflict at runtime. The issues are from the if-else and while statements because it was fine before adding them. Any help would be appreciated.
If possible please don't change the implementation much even if it's in bad practice. I just want help understanding it I don't want a complete overhaul (that'd be plagiarism).
import ply.lex as lex
import ply.yacc as yacc
# === Lexical tokens component ===
# List of possible token namesthat can be produced by the lexer
# NAME: variable name, L/RPAREN: Left/Right Parenthesis
tokens = (
'NAME', 'NUMBER',
'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'MODULO', 'EQUALS',
'LPAREN', 'RPAREN',
'IF', 'ELSE', 'WHILE',
'EQUAL', 'NOTEQ', 'LARGE', 'SMALL', 'LRGEQ', 'SMLEQ',
)
# Regular expression rules for tokens format: t_<TOKEN>
# Simple tokens: regex for literals +,-,*,/,%,=,(,) and variable names (alphanumeric)
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_MODULO = r'%'
t_EQUALS = r'='
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
t_IF = r'if'
t_ELSE = r'else'
t_WHILE = r'while'
t_EQUAL = r'\=\='
t_NOTEQ = r'\!\='
t_LARGE = r'\>'
t_SMALL = r'\<'
t_LRGEQ = r'\>\='
t_SMLEQ = r'\<\='
# complex tokens
# number token
def t_NUMBER(t):
r'\d+' # digit special character regex
t.value = int(t.value) # convert str -> int
return t
# Ignored characters
t_ignore = " \t" # spaces & tabs regex
# newline character
def t_newline(t):
r'\n+' # newline special character regex
t.lexer.lineno += t.value.count("\n") # increase current line number accordingly
# error handling for invalid character
def t_error(t):
print("Illegal character '%s'" % t.value[0]) # print error message with causing character
t.lexer.skip(1) # skip invalid character
# Build the lexer
lex.lex()
# === Yacc parsing/grammar component ===
# Precedence & associative rules for the arithmetic operators
# 1. Unary, right-associative minus.
# 2. Binary, left-associative multiplication, division, and modulus
# 3. Binary, left-associative addition and subtraction
# Parenthesis precedence defined through the grammar
precedence = (
('left', 'PLUS', 'MINUS'),
('left', 'TIMES', 'DIVIDE', 'MODULO'),
('right', 'UMINUS'),
)
# dictionary of names (for storing variables)
names = {}
# --- Grammar:
# <statement> -> NAME = <expression> | <expression>
# <expression> -> <expression> + <expression>
# | <expression> - <expression>
# | <expression> * <expression>
# | <expression> / <expression>
# | <expression> % <expression>
# | - <expression>
# | ( <expression> )
# | NUMBER
# | NAME
# ---
# defined below using function definitions with format string/comment
# followed by logic of changing state of engine
# if statement
def p_statement_if(p):
'''statement : IF LPAREN comparison RPAREN statement
| IF LPAREN comparison RPAREN statement ELSE statement'''
if p[3]:
p[0] = p[5]
else:
if p[7] is not None:
p[0] = p[7]
def p_statement_while(p):
'statement : WHILE LPAREN comparison RPAREN statement'
while(p[3]):
p[5];
# assignment statement: <statement> -> NAME = <expression>
def p_statement_assign(p):
'statement : NAME EQUALS expression'
names[p[1]] = p[3] # PLY engine syntax, p stores parser engine state
# expression statement: <statement> -> <expression>
def p_statement_expr(p):
'statement : expression'
print(p[1])
# comparison
def p_comparison_binop(p):
'''comparison : expression EQUAL expression
| expression NOTEQ expression
| expression LARGE expression
| expression SMALL expression
| expression LRGEQ expression
| expression SMLEQ expression'''
if p[2] == '==':
p[0] = p[1] == p[3]
elif p[2] == '!=':
p[0] = p[1] != p[3]
elif p[2] == '>':
p[0] = p[1] > p[3]
elif p[2] == '<':
p[0] = p[1] < p[3]
elif p[2] == '>=':
p[0] = p[1] >= p[3]
elif p[2] == '<=':
p[0] = p[1] <= p[3]
# binary operator expression: <expression> -> <expression> + <expression>
# | <expression> - <expression>
# | <expression> * <expression>
# | <expression> / <expression>
# | <expression> % <expression>
def p_expression_binop(p):
'''expression : expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression
| expression MODULO expression'''
if p[2] == '+':
p[0] = p[1] + p[3]
elif p[2] == '-':
p[0] = p[1] - p[3]
elif p[2] == '*':
p[0] = p[1] * p[3]
elif p[2] == '/':
p[0] = p[1] / p[3]
elif p[2] == '%':
p[0] = p[1] % p[3]
# unary minus operator expression: <expression> -> - <expression>
def p_expression_uminus(p):
'expression : MINUS expression %prec UMINUS'
p[0] = -p[2]
# parenthesis group expression: <expression> -> ( <expression> )
def p_expression_group(p):
'expression : LPAREN expression RPAREN'
p[0] = p[2]
# number literal expression: <expression> -> NUMBER
def p_expression_number(p):
'expression : NUMBER'
p[0] = p[1]
# variable name literal expression: <expression> -> NAME
def p_expression_name(p):
'expression : NAME'
# attempt to lookup variable in current dictionary, throw error if not found
try:
p[0] = names[p[1]]
except LookupError:
print("Undefined name '%s'" % p[1])
p[0] = 0
# handle parsing errors
def p_error(p):
print("Syntax error at '%s'" % p.value)
# build parser
yacc.yacc()
# start interpreter and accept input using commandline/console
while True:
try:
s = input('calc > ') # get user input. use raw_input() on Python 2
except EOFError:
break
yacc.parse(s) # parse user input string
Upvotes: 3
Views: 3233
Reputation: 241931
Your basic problem is that your lexer is not recognizing the keywords if
and while
(nor else
), because the t_NAME
pattern will trigger in those cases. The problem and a possible solution are described in section 4.3 of the Ply documentation. The problem is that:
Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).
and the expression for t_NAME
is longer than the simple keyword patterns.
You cannot fix this problem by just making t_NAME
into a lexer function, because tokens defined by functions are checked before tokens defined by strings.
But you can make t_NAME
into a function, and in the function look the matched string up in a dictionary to see whether or not it is a reserved word. (See the example at the end of the linked section, in the paragraph starting "To handle reserved words..."). When you do that, you do not define t_IF
, t_WHILE
and t_ELSE
at all.
The shift-reduce conflict is the "dangling else" problem. If you search for that phrase, you will find various solutions.
The easiest solution is to do nothing, and just ignore the warning, since Ply will do the right thing by default.
The second easiest solution is to add ('if', 'IF'), ('left', 'ELSE')
to the precedence list, and add a precedence marker to the if
production:
'''statement : IF LPAREN comparison RPAREN statement %prec IF
| IF LPAREN comparison RPAREN statement ELSE statement'''
Giving ELSE
a higher precedence value than IF
ensures that when the parser needs to choose between shifting ELSE
as in the second production or reducing with the first production, it chooses the shift (since ELSE
has higher precedence). In fact, that is the default behaviour, so the precedence declaration does not affect the parsing behaviour at all; however, it suppresses the shift-reduce conflict warning because the conflict has been resolved.
For another solution, see this question and answer.
Finally, do look at the comments to your question. Your actions for if
and while
statements are not going to work at all.
Upvotes: 5