Sean
Sean

Reputation: 1406

Signalling an error from a parser rule in PLY

I'm using PLY to parse commands for a custom definition file. Commands are defined one per line, and each one should start with a reserved keyword followed by a number of strings. I have successfully managed to write a lexer and parser for the grammar, but I am having problems raising a SyntaxError from within a production.

According to PLY's documentation, this is possible simply by throwing a SyntaxError from within the body of a parser rule:

If necessary, a production rule can manually force the parser to enter error recovery. This is done by raising the SyntaxError exception like this:

def p_production(p):
    'production : some production ...'
    raise SyntaxError

My code raises a SyntaxError within a production when it encounters invalid syntax, but when I run the program this error is not raised. Here is a minimal working example:

from ply import lex, yacc

class Parser(object):
    # reserved keyword tokens
    reserved = {
        "r": "R"
    }

    # top level tokens
    tokens = [
        'CHUNK',
        'NEWLINE'
    ]

    # add reserved tokens
    tokens += reserved.values()

    # ignore spaces and tabs
    t_ignore = ' \t'

    def __init__(self):
        # lexer and parser handlers
        self.lexer = lex.lex(module=self)
        self.parser = yacc.yacc(module=self)

    def parse(self, text):
        # pass text to yacc
        self.parser.parse(text, lexer=self.lexer)

    # detect new lines
    def t_newline(self, t):
        r'\n+'
        # generate newline token
        t.type = "NEWLINE"
        return t

    def t_CHUNK(self, t):
        r'[a-zA-Z0-9_=.:]+'
        # check if chunk is a keyword
        t.type = self.reserved.get(t.value.lower(), 'CHUNK')
        return t

    def t_error(self, t):
        raise SyntaxError("token error")

    def p_instruction_list(self, p):
        '''instruction_list : instruction
                            | instruction_list instruction'''
        pass

    # match instruction on their own lines
    def p_instruction(self, p):
        '''instruction : command NEWLINE
                       | NEWLINE'''
        pass

    def p_command(self, p):
        '''command : R CHUNK CHUNK CHUNK CHUNK'''
        # parse command
        if p[2] not in ["a", "b"]:
            raise SyntaxError("invalid thing")

    def p_error(self, p):
        raise SyntaxError("parsing error")

if __name__ == "__main__":
    parser = Parser()
    parser.parse("""
    r a text text text
    r c text text text
    r b text text text
    """)

The above example runs without outputting anything, which means it has successfully parsed the text, even though a syntax error should be raised in p_command due to the line r c text text text (the second token c is invalid; only a or b would be valid).

What am I doing wrong?

Upvotes: 2

Views: 1722

Answers (1)

rici
rici

Reputation: 241671

You are responsible for printing error messages, and you don't:

One important aspect of manually setting an error is that the p_error() function will NOT be called in this case. If you need to issue an error message, make sure you do it in the production that raises SyntaxError.

I don't believe p_error() should raise SyntaxError. It should just print an appropriate message (or otherwise register the fact that an error occurred) and let error recovery proceed. But in any event, it is not being called in this case, as indicated by the above quote.

I'm not 100% convinced by having the lexer raise SyntaxError either. My preferred strategy for lexical errors is to just pass them through to the parser and thereby centralise error handling in one place.

If you don't care about error recovery, don't use an error token in any rule. That token is only used for error recovery. If you just want to throw an exception as soon as an error is encountered, do that in p_error, and call p_error explicitly in places where it will not be called automatically (such as token errors and errors detected in semantic actions). You could throw ValueError or something derived from it; I'd stay away from SyntaxError, which has particular meaning to Ply and to Python in general.

Upvotes: 1

Related Questions