John Glen
John Glen

Reputation: 951

Problem with Forward declarations and multiline nested structures in PyParsing

My goal to is to parse every character in the following string with the patterns I have created with PyParsing. I have two nested structures I am trying to parse. The control structure and the macro structure, and they span multiple lines.

    """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """

These are my parsers. They work fine for parsing lines from a multi-file project up until I start trying to parse nested control and macro structures.

comment_parser = (Literal(";") + SkipTo(LineEnd()))

charmap_parser = CaselessKeyword("charmap") + QuotedString("\"") + \
                 Literal(",").suppress() + Word(hexnums + "$") + Opt(comment_parser)

expression = infix_notation(Word(printables, exclude_chars="() ** ~ + - * / % & | ^ != == <= >= < > !"),
                            [
                                ("()", 2, OpAssoc.LEFT),
                                ("**", 2, OpAssoc.LEFT),
                                (one_of("~ + -"), 1, OpAssoc.RIGHT),
                                (one_of("* / %"), 2, OpAssoc.LEFT),
                                (one_of("<< >>"), 2, OpAssoc.LEFT),
                                (one_of("& | ^"), 2, OpAssoc.LEFT),
                                ("+ -", 2, OpAssoc.LEFT),
                                ("!= == <= >= < >", 2, OpAssoc.LEFT),
                                ("&& ||", 2, OpAssoc.LEFT),
                                ("!", 1, OpAssoc.RIGHT),
                            ])

elif_parser = CaselessKeyword("elif") + expression

if_parser = CaselessKeyword("if") + expression

include_parser = CaselessLiteral("include") + QuotedString("\"") + Opt(comment_parser)
include_parser.add_parse_action(parse_include)

label = Word(printables, excludeChars=":") + Literal(":")

newcharmap_parser = CaselessKeyword("newcharmap") + Word(printables) + Opt(comment_parser)

numeric_assignment = Word(printables) + Literal("=") + Word(printables)

popc = CaselessKeyword("popc") + Opt(comment_parser)

pushc = CaselessKeyword("pushc") + Opt(comment_parser)

redef = CaselessKeyword("redef") + Word(printables) + \
        (CaselessKeyword("equ") ^ CaselessKeyword("equs")) + \
        QuotedString("\"")

all_rgbasm_parsers = Forward()

control = Forward()

macro_parser = Forward()

all_rgbasm_parsers <<= (charmap_parser ^ comment_parser ^ include_parser ^ newcharmap_parser ^
                        numeric_assignment ^ popc ^ pushc ^ redef ^ control ^ macro_parser ^ label)

control <<= if_parser + OneOrMore(all_rgbasm_parsers) + Opt(elif_parser ^ CaselessKeyword("else")) + \
    ZeroOrMore(all_rgbasm_parsers) + CaselessKeyword("endc")


macro_parser <<= Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + \
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))

I expect the macro_parser to return a nested list of results from parsing the above string.

The problem is that the macro_parser does not work. I end up with Expected end of text, found 'MACRO' A very unhelpful error message.

If I remove label from all_rgbasm_parsers I get an even worse message Expected end of text, found 'table' I get the same error message when trying to parse with this

((Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") +
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))) ^ comment_parser)

I see nowhere in the expression above where it would expect a newline at the start of a line. I may be overlooking something. It appears that Word(printables, excludeChars=":") does not include the character _ when it parses despite the fact that string.printable includes it.

I am testing the parser with this


    test = """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """
    from rgbasm_parsers import all_rgbasm_parsers
    all_parsers = OneOrMore(Group(all_rgbasm_parsers))
    print(all_parsers.parse_string(test, parseAll=True))

I have tested OneOrMore(Group(all_rgbasm_parsers)) with files that include no nested structures, and that gives me the correct results, so I do not think that that code is the problem, though I may be wrong.

It may be that part of the problem is that the nested structures span multiple lines, but Expected end of text, found 'table' makes me thing otherwise.

I think I might be using Forward wrong.

Any ideas?

Upvotes: 0

Views: 81

Answers (1)

John Glen
John Glen

Reputation: 951

Found 2 things wrong.

1st, there were some missing one_ofs in the infix_notation

expression = infix_notation(Word(
    printables,
    exclude_chars=" ** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
),
    [
        ("**", 2, OpAssoc.LEFT),
        (one_of("~ + -"), 1, OpAssoc.RIGHT),
        (one_of("* / % *= /= %="), 2, OpAssoc.LEFT),
        (one_of("<< >> <<= >>="), 2, OpAssoc.LEFT),
        (one_of("& | ^ &= |= ^="), 2, OpAssoc.LEFT),
        (one_of("+ - += -="), 2, OpAssoc.LEFT),
        (one_of("!= == <= >= < >"), 2, OpAssoc.LEFT),
        (one_of("&& ||"), 2, OpAssoc.LEFT),
        ("!", 1, OpAssoc.RIGHT),
    ])

Then the "endm" was not being consumed resulting in a ParseExcetion.

macro_parser <<= (Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + 
                  OneOrMore(all_rgbasm_parsers) +
                  FollowedBy(CaselessKeyword("endm"))) + CaselessKeyword("endm")

Upvotes: 1

Related Questions