Rayyan Cyclegar
Rayyan Cyclegar

Reputation: 59

Lark Parser raising error when evaluating print statement after if/else statement

so I'm making a programming language using python and the lark library for parsing. When I'm parsing the following

if 5 == 4 {
    print("TRUE");
}
else {
    print("FALSE");
}
print("Done!");

It raises the following error

PS E:\ParserAndLexer> & C:/Python38/python.exe e:/ParserAndLexer/lite/lite_transformer.py
Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\lark\lexer.py", line 416, in lex
    for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
  File "C:\Python38\lib\site-packages\lark\lexer.py", line 200, in lex
    raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
lark.exceptions.UnexpectedCharacters: No terminal defined for 'p' at line 7 col 1

print("HI");
^

Expecting: {'IF'}

Previous tokens: Token('RBRACE', '}')


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "e:/ParserAndLexer/lite/lite_transformer.py", line 80, in <module>
    tree = parser.parse(lite_code)
  File "C:\Python38\lib\site-packages\lark\lark.py", line 464, in parse
    return self.parser.parse(text, start=start)
  File "C:\Python38\lib\site-packages\lark\parser_frontends.py", line 148, in parse
    return self._parse(token_stream, start, set_parser_state)
  File "C:\Python38\lib\site-packages\lark\parser_frontends.py", line 63, in _parse
    return self.parser.parse(input, start, *args)
  File "C:\Python38\lib\site-packages\lark\parsers\lalr_parser.py", line 35, in parse
    return self.parser.parse(*args)
  File "C:\Python38\lib\site-packages\lark\parsers\lalr_parser.py", line 86, in parse
    for token in stream:
  File "C:\Python38\lib\site-packages\lark\indenter.py", line 32, in _process
    for token in stream:
  File "C:\Python38\lib\site-packages\lark\lexer.py", line 431, in lex
    raise UnexpectedToken(t, e.allowed, state=e.state)
lark.exceptions.UnexpectedToken: Unexpected token Token('NAME', 'print') at line 7, column 1.
Expected one of:
        * IF

I can't figure out why this is happening, here is somewhat what my code looks like:

from lark import Lark, Transformer, v_args
from lark.indenter import Indenter

class MainIndenter(Indenter):
    NL_type = '_NL'
    OPEN_PAREN_types = ['LPAR', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RBRACE']
    INDENT_TYPE = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

@v_args(inline=True)
class MainTransformer(Transformer):
    def __init__(self):
        ...

    def number(self, value):
        return Integer(value)

    def string(self, value):
        value = str(value).strip('"')
        return String(value)

    def div(self, val1, val2):
        return Div(val1, val2)

    def print_statement(self, value):
        return Print(value)

    def if_statement(self, expr1, expr2, eval_expr):
        return If(expr1, expr2, eval_expr)
    
    def if_else_statement(self, expr1, expr2, eval_expr, else_statement):
        return If(expr1, expr2, eval_expr, else_statement)
    
    def if_statements(self, *values):
        for value in values:
            value.eval()

    def statement(self, *values):
        for value in values:
            value.eval()

grammar = '''
?start: expr*
      | statement* -> statement
      | if* -> if_statements

?if : "if" expr "==" expr "{" statement+ "}" -> if_statement
    | "if" expr "==" expr "{" expr+ "}" -> if_statement
    | "if" expr "==" expr "{" statement+ "}" "else" "{" statement+ "}" -> if_else_statement

?statement: "print" "(" expr ")" ";"  -> print_statement
          | "input" "(" expr ")" ";"  -> input_statement
          | NAME "=" expr ";"      -> assign_var
          | NAME "=" "input" "(" expr ")" ";" -> var_input_statement

?expr: STRING            -> string
     | NUMBER            -> number
     | NAME              -> get_var
%import common.ESCAPED_STRING -> STRING 
%import common.NUMBER
%import common.CNAME -> NAME
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
'''
class Print():
    def __init__(self, value):
        self.value = value

    def eval(self):
        return print(self.value.eval())

class Input():
    def __init__(self, value):
        self.value = value

    def eval(self):
        return input(self.value.eval())

class String():
    def __init__(self, value):
        self.value = str(value).strip('"')

    def eval(self):
        return self.value

class Integer():
    def __init__(self, value):
        self.value = int(value)

    def eval(self):
        return self.value

class If():
    def __init__(self, expr1, expr2, eval_expr, else_statement=None):
        self.expr1 = expr1
        self.expr2 = expr2
        self.eval_expr = eval_expr
        self.else_statement = else_statement
        
    def eval(self):
        if self.expr1.eval() == self.expr2.eval():
            return self.eval_expr.eval()
        else:
            if self.else_statement == None:
                return
            else:
                return self.else_statement.eval()

parser = Lark(grammar, parser='lalr', postlex=MainIndenter())
test_input = '''
if 5 == 5 {
    print("True");
}
else {
    print("False");
}
print("Done");
'''

if __name__ == '__main__':
    tree = parser.parse(test_input)
    print(MainTransformer().transform(tree))

Upvotes: 1

Views: 1650

Answers (1)

ggorlen
ggorlen

Reputation: 57344

I'm not familiar with lark, but this looks wrong:

?start: expr*
      | statement* -> statement
      | if* -> if_statements

What this says is "expand the start rule to zero or more exprs, zero or more statementss, or zero or more ifs. This means your grammar doesn't support mixing the three kinds of productions together as you're doing in the source string you're attempting to parse. If you start with an if, the rest of the program has to be all ifs, so throwing in a statement as in print("DONE"); is prohibited (the error message says as much--it's expecting another if).

You can fix this with something like:

?start: stmt*
?stmt: expr
     | statement -> statement
     | if -> if_statements

This grammar says "expand the start rule to zero or more stmts, where stmt is defined as an expr, a statement or an if. In this manner, you can mix and match the three types of productions.

Awkward naming choice aside, after this short-term fix, the grammar still has other obvious deficiencies, like an inability to support nested if blocks. Since your ultimate goals aren't clear, I'll avoid presumption and keep scope to your immediate issue.

Upvotes: 2

Related Questions