Rayyan Cyclegar
Rayyan Cyclegar

Reputation: 59

Lark transformer returning Tree while parsing multiple statements

I'm making a programming language using Lark, and I'm trying to parse multiple statements from a file. When I parse

print("HI");
print("HI");

It returns

Tree('start', ['HI', HI'])

But when I parse

print("Hi");

It returns

Hi

Heres what my grammar somewhat looks like

?start: expr
      | statement*
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement

%import common.ESCAPED_STRING -> STRING 
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL

And heres how my transformer file works

from lark import Transformer, v_args
class MainTransformer(Transformer):
  string = str
  def print_statement(self, value):
    value = str(value).strip('"')
    return value

And heres how my indenter code works

class MainIndenter(Indenter):
    NL_type = '_NL'
    OPEN_PAREN_types = ['LPAR', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RBRACE']
    INDENT_TYPE = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

And heres my main.py file

from lark import Lark
from transformer import MainTransformer
from indenter import MainIndenter
parser = Lark.open("main_parser.lark", parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())
main_parser = parser.parse

input_str = '''
print("HI");
print("HI");
'''
print(main_parser(input_str))

Help would be appreciated, thanks!

Upvotes: 0

Views: 2401

Answers (1)

I had a play with this, which would have been a whole lot easier and 15 minutes quicker for me if you put a complete minimal reproducible example (mre) in your question - please do that next time, because I don't intend in the future to spend 15 minutes recreating somthing that should be in your question. In particular make it a single block of code complete with all needed imports and if you feel the urge to split it with blocks of text please resist and use Python comments instead in place of that text

So here's a free mre.

One thing is I didn't get the same result for a single print statement "HI" - I got a token with a STRING value "HI".

First I added the -> statements

Then I removed the string = str because that just isn't right: it's converting values (which is always a list) into a literal of a list as a string.

Then I added a string() transformer and the statements() transformer. making them print their input and return values makes it a bit easier to see what's going on. On the project I used Lark on I kept these prints identifying the transformer function and input/output in until I'd got it all stable+working - so I could e.g. check that an identifier was correctly being transformed to a URI, or the two values for an addition are being added and a single value returned.

The transformer function take value which is always a list and for a unary operator like print or string return the first (only) item in it. An operator which takes two inputs would get the two things to add as two entries in value, add them, and return the result; that's transformation.

A Token is a str with metadata, you can see the results when you run this code.

from lark import Lark
from lark.indenter import Indenter
from lark import Transformer, v_args

grammar = """
?start: expr
      | statement* -> statements // ADDED
?expr: STRING -> string
?statement: "print" "(" expr ")" ";" -> print_statement

%import common.ESCAPED_STRING -> STRING 
%declare _INDENT _DEDENT
%import common.WS_INLINE
%ignore WS_INLINE
%import common.NEWLINE -> _NL
%ignore _NL
"""

class MainIndenter(Indenter):
    NL_type = '_NL'
    OPEN_PAREN_types = ['LPAR', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RBRACE']
    INDENT_TYPE = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8


class MainTransformer(Transformer):
#    string = str # REMOVED
    def string(self,value): # ADDED
        print( f"string {value=}" )
        res = value[0]  # this seems like quite a common thing to do for a unary thing like string - return value[0]
        print( f"string returning {res}" )
        return res
    
    def print_statement(self, value):
        print( f"print_statement {value=}" )
#        value = str(value).strip('"')
        res = value[0]  # this seems like quite a common thing to do for a unary thing like print - return value[0]
        print( f"print_statement returning {res}" )
        return res
        
    def statements(self,value): # ADDED
        print( f"statements {value=}" )
        for i,v in enumerate(value):
            print( f"  {i=} {v=}" )
        return value

parser = Lark(grammar, parser="lalr", transformer=MainTransformer(), postlex=MainIndenter())

main_parser = parser.parse

hiho_input_str = '''
print("HI");
print("HO");
print("HI");
print("HO");
'''

hihoresult = main_parser(hiho_input_str)
print( "hiho result=")
for i,hiho in enumerate(hihoresult):
    print(f"  {i} {hiho}")
print()

hi_input_str = '''
print("HI");
'''

print("Hi result=",main_parser(hi_input_str))

Results:

string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
string value=[Token('STRING', '"HO"')]
string returning "HO"
print_statement value=[Token('STRING', '"HO"')]
print_statement returning "HO"
statements value=[Token('STRING', '"HI"'), Token('STRING', '"HO"'), Token('STRING', '"HI"'), Token('STRING', '"HO"')]
  i=0 v=Token('STRING', '"HI"')
  i=1 v=Token('STRING', '"HO"')
  i=2 v=Token('STRING', '"HI"')
  i=3 v=Token('STRING', '"HO"')
hiho result=
  0 "HI"
  1 "HO"
  2 "HI"
  3 "HO"

string value=[Token('STRING', '"HI"')]
string returning "HI"
print_statement value=[Token('STRING', '"HI"')]
print_statement returning "HI"
statements value=[Token('STRING', '"HI"')]
  i=0 v=Token('STRING', '"HI"')
Hi result= [Token('STRING', '"HI"')]

If you might want to change what a string returns, do it first because that change ripples up through the items in the transformer.

Upvotes: 1

Related Questions