Fred
Fred

Reputation: 921

PLY : missed "if" statement

It's my first attempt to use PLY or any lexer/parser tool, so I'm not sure about what's wrong.

I'm trying to implement a small assembly language loosely based on Python syntax, especially for if statements with indented block. Here's an example :

if d0 == 0x42:
    a1 = d2
d0 = 0

I wrote a parser to handle INDENT and DEDENT, that produces this token list :

LexToken(IF,'if',1,0)
LexToken(REGISTER,'d0',1,3)
LexToken(IS_EQUAL,'==',1,6)
LexToken(NUMBER,'0x42',1,9)
LexToken(COLON,':',1,13)
LexToken(INDENT,'\t',2,15)
LexToken(REGISTER,'a1',2,16)
LexToken(EQUAL,'=',2,19)
LexToken(REGISTER,'d2',2,21)
LexToken(DEDENT,'d0',3,24)
LexToken(REGISTER,'d0',3,24)
LexToken(EQUAL,'=',3,27)
LexToken(NUMBER,'0',3,29)

That seems OK (the INDENT and DEDENT value, lineno and lexpos are wrong, but I don't use them)

My parser is :

import ply.yacc as yacc

# Get the token map from the lexer.  This is required.
from asmlexer import tokens, MyLexer
from instr import *


def p_statement_assignment(p):
    'statement : assignment'
    print('statmt1 :', [x for x in p])
    p[0] = p[1]
    
def p_statement_ifstatmt(p):
    'statement : ifstatmt'
    print('statmt2 :', [x for x in p])
    p[0] = p[1]
    
def p_assignment(p):
    'assignment : value EQUAL value'
    print("assignment :", [x for x in p])
    p[0] = Move(p[3], p[1])

def p_ifstatmt(p):
    'ifstatmt : IF condition COLON INDENT statement DEDENT'
    print("IF :", [x for x in p])
    p[0] = If(p[2], body = p[5])

def p_condition_equal(p):
    'condition : value IS_EQUAL value'
    print("condition ==", [x for x in p])
    p[0] = "%s == %s" % (p[1], p[3])
    
def p_value_register(p):
    'value : REGISTER'
    print('register :', [x for x in p])
    p[0] = Register(p[1])

def p_value_number(p):
    'value : NUMBER'
    print('value:', [x for x in p])
    p[0] = Value(p[1])

# Error rule for syntax errors
def p_error(p):
    print(p)
    print("Syntax error in input!")

# Build the parser
class MyParser(object):
    def __init__(self):
        lexer = MyLexer()
        self.lexer = lexer
        self.parser = yacc.yacc()
    
    def parse(self, code):
        self.lexer.input(code)
        result = self.parser.parse(lexer = self.lexer)
        return result

if True:
    with open("test.psm") as f:
        data = f.read()

    parser = MyParser()
    result = parser.parse(data)
    print(result)
    print(result.get_code())

It seems the IF token is missed :

register : [None, 'd0']
value: [None, '0x42']
condition == [None, <instr.Register object at 0x000002429DA5E340>, '==', <instr.Value object at 0x000002429DA5E250>]
register : [None, 'a1']
register : [None, 'd2']
assignment : [None, <instr.Register object at 0x000002429DA2C970>, '=', <instr.Register object at 0x000002429DA5E5E0>]
statmt1 : [None, <instr.Move object at 0x000002429DA5E370>]
LexToken(REGISTER,'d0',3,24)
Syntax error in input!
value: [None, '0']
None
Traceback (most recent call last):
  File ".\asmparser.py", line 137, in <module>
    print(result.get_code())
AttributeError: 'NoneType' object has no attribute 'get_code'

and I don't understand why...

Upvotes: 1

Views: 254

Answers (1)

rici
rici

Reputation: 241721

The start symbol for your grammar is statement, so your grammar describes an input consisting of one statement, either an assignment or a conditional. The token which follows that unitary statement must, therefore, be the end of input. But it's not. It's the REGISTER d0, since your input comprises two statements.

LALR(1) parsers, which is what Ply generates, may use the next token to validate possible reduction actions. [Note 1] If the next token cannot follow the reduced non-terminal, then an error will be signalled. Whether this error is signalled before or after the reduction depends on the particular nature of the parser generator. Some parser generators, like Bison, optimise lookahead aggressively and their parsers will not even read the lookahead token if it is not absolutely necessary. But most parser generators, including Ply, produce parsers which always read the lookahead token before deciding on the next action. [Note 2]

If you want your parser to handle a sequence of statements, you will need a start symbol which expands to a sequence of statements, such as program: | program statement. You'll probably also want to allow the body of your if statement to contain a sequence of statements, rather than just the single statement allowed by your current grammar.


Notes

  1. A "reduction action" is what the parser does when it reaches the end of a non-terminal, and "reduces" the sequence corresponding to that non-terminal to the single non-terminal. As part of the reduction action, the parser executes the reduction action; in the case of Ply, that means calling the p_ function associated with the production being reduced.

  2. Just because the parser has read the lookahead token doesn't mean that it will necessarily consult it before doing the reduction. Many parsers use compressed tables which may combine error actions with the default reduction action.

Upvotes: 2

Related Questions