Reputation: 59
so I'm making a programming language using python and the lark
library for parsing. When I'm parsing the following
if 5 == 4 {
print("TRUE");
}
else {
print("FALSE");
}
print("Done!");
It raises the following error
PS E:\ParserAndLexer> & C:/Python38/python.exe e:/ParserAndLexer/lite/lite_transformer.py
Traceback (most recent call last):
File "C:\Python38\lib\site-packages\lark\lexer.py", line 416, in lex
for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
File "C:\Python38\lib\site-packages\lark\lexer.py", line 200, in lex
raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
lark.exceptions.UnexpectedCharacters: No terminal defined for 'p' at line 7 col 1
print("HI");
^
Expecting: {'IF'}
Previous tokens: Token('RBRACE', '}')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:/ParserAndLexer/lite/lite_transformer.py", line 80, in <module>
tree = parser.parse(lite_code)
File "C:\Python38\lib\site-packages\lark\lark.py", line 464, in parse
return self.parser.parse(text, start=start)
File "C:\Python38\lib\site-packages\lark\parser_frontends.py", line 148, in parse
return self._parse(token_stream, start, set_parser_state)
File "C:\Python38\lib\site-packages\lark\parser_frontends.py", line 63, in _parse
return self.parser.parse(input, start, *args)
File "C:\Python38\lib\site-packages\lark\parsers\lalr_parser.py", line 35, in parse
return self.parser.parse(*args)
File "C:\Python38\lib\site-packages\lark\parsers\lalr_parser.py", line 86, in parse
for token in stream:
File "C:\Python38\lib\site-packages\lark\indenter.py", line 32, in _process
for token in stream:
File "C:\Python38\lib\site-packages\lark\lexer.py", line 431, in lex
raise UnexpectedToken(t, e.allowed, state=e.state)
lark.exceptions.UnexpectedToken: Unexpected token Token('NAME', 'print') at line 7, column 1.
Expected one of:
* IF
I can't figure out why this is happening, here is somewhat what my code looks like:
from lark import Lark, Transformer, v_args
from lark.indenter import Indenter
class MainIndenter(Indenter):
NL_type = '_NL'
OPEN_PAREN_types = ['LPAR', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RBRACE']
INDENT_TYPE = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
@v_args(inline=True)
class MainTransformer(Transformer):
def __init__(self):
...
def number(self, value):
return Integer(value)
def string(self, value):
value = str(value).strip('"')
return String(value)
def div(self, val1, val2):
return Div(val1, val2)
def print_statement(self, value):
return Print(value)
def if_statement(self, expr1, expr2, eval_expr):
return If(expr1, expr2, eval_expr)
def if_else_statement(self, expr1, expr2, eval_expr, else_statement):
return If(expr1, expr2, eval_expr, else_statement)
def if_statements(self, *values):
for value in values:
value.eval()
def statement(self, *values):
for value in values:
value.eval()
grammar = '''
?start: expr*
| statement* -> statement
| if* -> if_statements
?if : "if" expr "==" expr "{" statement+ "}" -> if_statement
| "if" expr "==" expr "{" expr+ "}" -> if_statement
| "if" expr "==" expr "{" statement+ "}" "else" "{" statement+ "}" -> if_else_statement
?statement: "print" "(" expr ")" ";" -> print_statement
| "input" "(" expr ")" ";" -> input_statement
| NAME "=" expr ";" -> assign_var
| NAME "=" "input" "(" expr ")" ";" -> var_input_statement
?expr: STRING -> string
| NUMBER -> number
| NAME -> get_var
%import common.ESCAPED_STRING -> STRING
%import common.NUMBER
%import common.CNAME -> NAME
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
'''
class Print():
def __init__(self, value):
self.value = value
def eval(self):
return print(self.value.eval())
class Input():
def __init__(self, value):
self.value = value
def eval(self):
return input(self.value.eval())
class String():
def __init__(self, value):
self.value = str(value).strip('"')
def eval(self):
return self.value
class Integer():
def __init__(self, value):
self.value = int(value)
def eval(self):
return self.value
class If():
def __init__(self, expr1, expr2, eval_expr, else_statement=None):
self.expr1 = expr1
self.expr2 = expr2
self.eval_expr = eval_expr
self.else_statement = else_statement
def eval(self):
if self.expr1.eval() == self.expr2.eval():
return self.eval_expr.eval()
else:
if self.else_statement == None:
return
else:
return self.else_statement.eval()
parser = Lark(grammar, parser='lalr', postlex=MainIndenter())
test_input = '''
if 5 == 5 {
print("True");
}
else {
print("False");
}
print("Done");
'''
if __name__ == '__main__':
tree = parser.parse(test_input)
print(MainTransformer().transform(tree))
Upvotes: 1
Views: 1650
Reputation: 57344
I'm not familiar with lark, but this looks wrong:
?start: expr*
| statement* -> statement
| if* -> if_statements
What this says is "expand the start rule to zero or more expr
s, zero or more statements
s, or zero or more if
s. This means your grammar doesn't support mixing the three kinds of productions together as you're doing in the source string you're attempting to parse. If you start with an if
, the rest of the program has to be all if
s, so throwing in a statement
as in print("DONE");
is prohibited (the error message says as much--it's expecting another if
).
You can fix this with something like:
?start: stmt*
?stmt: expr
| statement -> statement
| if -> if_statements
This grammar says "expand the start rule to zero or more stmt
s, where stmt
is defined as an expr
, a statement
or an if
. In this manner, you can mix and match the three types of productions.
Awkward naming choice aside, after this short-term fix, the grammar still has other obvious deficiencies, like an inability to support nested if
blocks. Since your ultimate goals aren't clear, I'll avoid presumption and keep scope to your immediate issue.
Upvotes: 2