Reputation: 1406
I'm using PLY to parse commands for a custom definition file. Commands are defined one per line, and each one should start with a reserved keyword followed by a number of strings. I have successfully managed to write a lexer and parser for the grammar, but I am having problems raising a SyntaxError
from within a production.
According to PLY's documentation, this is possible simply by throwing a SyntaxError
from within the body of a parser rule:
If necessary, a production rule can manually force the parser to enter error recovery. This is done by raising the SyntaxError exception like this:
def p_production(p): 'production : some production ...' raise SyntaxError
My code raises a SyntaxError
within a production when it encounters invalid syntax, but when I run the program this error is not raised. Here is a minimal working example:
from ply import lex, yacc
class Parser(object):
# reserved keyword tokens
reserved = {
"r": "R"
}
# top level tokens
tokens = [
'CHUNK',
'NEWLINE'
]
# add reserved tokens
tokens += reserved.values()
# ignore spaces and tabs
t_ignore = ' \t'
def __init__(self):
# lexer and parser handlers
self.lexer = lex.lex(module=self)
self.parser = yacc.yacc(module=self)
def parse(self, text):
# pass text to yacc
self.parser.parse(text, lexer=self.lexer)
# detect new lines
def t_newline(self, t):
r'\n+'
# generate newline token
t.type = "NEWLINE"
return t
def t_CHUNK(self, t):
r'[a-zA-Z0-9_=.:]+'
# check if chunk is a keyword
t.type = self.reserved.get(t.value.lower(), 'CHUNK')
return t
def t_error(self, t):
raise SyntaxError("token error")
def p_instruction_list(self, p):
'''instruction_list : instruction
| instruction_list instruction'''
pass
# match instruction on their own lines
def p_instruction(self, p):
'''instruction : command NEWLINE
| NEWLINE'''
pass
def p_command(self, p):
'''command : R CHUNK CHUNK CHUNK CHUNK'''
# parse command
if p[2] not in ["a", "b"]:
raise SyntaxError("invalid thing")
def p_error(self, p):
raise SyntaxError("parsing error")
if __name__ == "__main__":
parser = Parser()
parser.parse("""
r a text text text
r c text text text
r b text text text
""")
The above example runs without outputting anything, which means it has successfully parsed the text, even though a syntax error should be raised in p_command
due to the line r c text text text
(the second token c
is invalid; only a
or b
would be valid).
What am I doing wrong?
Upvotes: 2
Views: 1722
Reputation: 241671
You are responsible for printing error messages, and you don't:
One important aspect of manually setting an error is that the
p_error()
function will NOT be called in this case. If you need to issue an error message, make sure you do it in the production that raisesSyntaxError
.
I don't believe p_error()
should raise SyntaxError
. It should just print an appropriate message (or otherwise register the fact that an error occurred) and let error recovery proceed. But in any event, it is not being called in this case, as indicated by the above quote.
I'm not 100% convinced by having the lexer raise SyntaxError
either. My preferred strategy for lexical errors is to just pass them through to the parser and thereby centralise error handling in one place.
If you don't care about error recovery, don't use an error
token in any rule. That token is only used for error recovery. If you just want to throw an exception as soon as an error is encountered, do that in p_error
, and call p_error
explicitly in places where it will not be called automatically (such as token errors and errors detected in semantic actions). You could throw ValueError
or something derived from it; I'd stay away from SyntaxError
, which has particular meaning to Ply and to Python in general.
Upvotes: 1