Reputation: 59
I'm making a programming language using Lark, and I'm trying to parse multiple statements from a file. When I parse
print("HI");
print("HI");
It returns
Tree('start', ['HI', HI'])
But when I parse
print("Hi");
It returns
Hi
Heres what my grammar somewhat looks like
?start: expr
| statement*
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement
%import common.ESCAPED_STRING -> STRING
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
And heres how my transformer file works
from lark import Transformer, v_args
class MainTransformer(Transformer):
string = str
def print_statement(self, value):
value = str(value).strip('"')
return value
And heres how my indenter code works
class MainIndenter(Indenter):
NL_type = '_NL'
OPEN_PAREN_types = ['LPAR', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RBRACE']
INDENT_TYPE = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
And heres my main.py
file
from lark import Lark
from transformer import MainTransformer
from indenter import MainIndenter
parser = Lark.open("main_parser.lark", parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())
main_parser = parser.parse
input_str = '''
print("HI");
print("HI");
'''
print(main_parser(input_str))
Help would be appreciated, thanks!
Upvotes: 0
Views: 2401
Reputation: 6826
I had a play with this, which would have been a whole lot easier and 15 minutes quicker for me if you put a complete minimal reproducible example (mre) in your question - please do that next time, because I don't intend in the future to spend 15 minutes recreating somthing that should be in your question. In particular make it a single block of code complete with all needed imports and if you feel the urge to split it with blocks of text please resist and use Python comments instead in place of that text
So here's a free mre.
One thing is I didn't get the same result for a single print statement "HI" - I got a token with a STRING value "HI".
First I added the -> statements
Then I removed the string = str
because that just isn't right: it's converting values (which is always a list) into a literal of a list as a string.
Then I added a string()
transformer and the statements()
transformer. making them print their input and return values makes it a bit easier to see what's going on. On the project I used Lark on I kept these prints identifying the transformer function and input/output in until I'd got it all stable+working - so I could e.g. check that an identifier was correctly being transformed to a URI, or the two values for an addition are being added and a single value returned.
The transformer function take value
which is always a list and for a unary operator like print or string return the first (only) item in it. An operator which takes two inputs would get the two things to add as two entries in value
, add them, and return the result; that's transformation.
A Token is a str with metadata, you can see the results when you run this code.
from lark import Lark
from lark.indenter import Indenter
from lark import Transformer, v_args
grammar = """
?start: expr
| statement* -> statements // ADDED
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement
%import common.ESCAPED_STRING -> STRING
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
"""
class MainIndenter(Indenter):
NL_type = '_NL'
OPEN_PAREN_types = ['LPAR', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RBRACE']
INDENT_TYPE = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
class MainTransformer(Transformer):
# string = str # REMOVED
def string(self,value): # ADDED
print( f"string {value=}" )
res = value[0] # this seems like quite a common thing to do for a unary thing like string - return value[0]
print( f"string returning {res}" )
return res
def print_statement(self, value):
print( f"print_statement {value=}" )
# value = str(value).strip('"')
res = value[0] # this seems like quite a common thing to do for a unary thing like print - return value[0]
print( f"print_statement returning {res}" )
return res
def statements(self,value): # ADDED
print( f"statements {value=}" )
for i,v in enumerate(value):
print( f" {i=} {v=}" )
return value
parser = Lark(grammar, parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())
main_parser = parser.parse
hiho_input_str = '''
print("HI");
print("HO");
print("HI");
print("HO");
'''
hihoresult = main_parser(hiho_input_str)
print( "hiho result=")
for i,hiho in enumerate(hihoresult):
print(f" {i} {hiho}")
print()
hi_input_str = '''
print("HI");
'''
print("Hi result=",main_parser(hi_input_str))
Results:
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
statements value=[Token('STRING', '"HI"'), Token('STRING', '"HO"'), Token('STRING', '"HI"'), Token('STRING', '"HO"')]
i=0 v=Token('STRING', '"HI"')
i=1 v=Token('STRING', '"HO"')
i=2 v=Token('STRING', '"HI"')
i=3 v=Token('STRING', '"HO"')
hiho result=
0 "HI"
1 "HO"
2 "HI"
3 "HO"
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
statements value=[Token('STRING', '"HI"')]
i=0 v=Token('STRING', '"HI"')
Hi result= [Token('STRING', '"HI"')]
If you might want to change what a string returns, do it first because that change ripples up through the items in the transformer.
Upvotes: 1