Reputation: 951
My goal to is to parse every character in the following string with the patterns I have created with PyParsing. I have two nested structures I am trying to parse. The control structure and the macro structure, and they span multiple lines.
"""
; Macros to verify assumptions about the data or code
table_width: MACRO
CURRENT_TABLE_WIDTH = \\1
if _NARG == 2
REDEF CURRENT_TABLE_START EQUS "\\2"
else
REDEF CURRENT_TABLE_START EQUS "._table_width\@"
{CURRENT_TABLE_START}:
endc
ENDM
"""
These are my parsers. They work fine for parsing lines from a multi-file project up until I start trying to parse nested control and macro structures.
comment_parser = (Literal(";") + SkipTo(LineEnd()))
charmap_parser = CaselessKeyword("charmap") + QuotedString("\"") + \
Literal(",").suppress() + Word(hexnums + "$") + Opt(comment_parser)
expression = infix_notation(Word(printables, exclude_chars="() ** ~ + - * / % & | ^ != == <= >= < > !"),
[
("()", 2, OpAssoc.LEFT),
("**", 2, OpAssoc.LEFT),
(one_of("~ + -"), 1, OpAssoc.RIGHT),
(one_of("* / %"), 2, OpAssoc.LEFT),
(one_of("<< >>"), 2, OpAssoc.LEFT),
(one_of("& | ^"), 2, OpAssoc.LEFT),
("+ -", 2, OpAssoc.LEFT),
("!= == <= >= < >", 2, OpAssoc.LEFT),
("&& ||", 2, OpAssoc.LEFT),
("!", 1, OpAssoc.RIGHT),
])
elif_parser = CaselessKeyword("elif") + expression
if_parser = CaselessKeyword("if") + expression
include_parser = CaselessLiteral("include") + QuotedString("\"") + Opt(comment_parser)
include_parser.add_parse_action(parse_include)
label = Word(printables, excludeChars=":") + Literal(":")
newcharmap_parser = CaselessKeyword("newcharmap") + Word(printables) + Opt(comment_parser)
numeric_assignment = Word(printables) + Literal("=") + Word(printables)
popc = CaselessKeyword("popc") + Opt(comment_parser)
pushc = CaselessKeyword("pushc") + Opt(comment_parser)
redef = CaselessKeyword("redef") + Word(printables) + \
(CaselessKeyword("equ") ^ CaselessKeyword("equs")) + \
QuotedString("\"")
all_rgbasm_parsers = Forward()
control = Forward()
macro_parser = Forward()
all_rgbasm_parsers <<= (charmap_parser ^ comment_parser ^ include_parser ^ newcharmap_parser ^
numeric_assignment ^ popc ^ pushc ^ redef ^ control ^ macro_parser ^ label)
control <<= if_parser + OneOrMore(all_rgbasm_parsers) + Opt(elif_parser ^ CaselessKeyword("else")) + \
ZeroOrMore(all_rgbasm_parsers) + CaselessKeyword("endc")
macro_parser <<= Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + \
OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))
I expect the macro_parser to return a nested list of results from parsing the above string.
The problem is that the macro_parser does not work. I end up with Expected end of text, found 'MACRO'
A very unhelpful error message.
If I remove label
from all_rgbasm_parsers
I get an even worse message Expected end of text, found 'table'
I get the same error message when trying to parse with this
((Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") +
OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))) ^ comment_parser)
I see nowhere in the expression above where it would expect a newline at the start of a line. I may be overlooking something. It appears that Word(printables, excludeChars=":")
does not include the character _ when it parses despite the fact that string.printable includes it.
I am testing the parser with this
test = """
; Macros to verify assumptions about the data or code
table_width: MACRO
CURRENT_TABLE_WIDTH = \\1
if _NARG == 2
REDEF CURRENT_TABLE_START EQUS "\\2"
else
REDEF CURRENT_TABLE_START EQUS "._table_width\@"
{CURRENT_TABLE_START}:
endc
ENDM
"""
from rgbasm_parsers import all_rgbasm_parsers
all_parsers = OneOrMore(Group(all_rgbasm_parsers))
print(all_parsers.parse_string(test, parseAll=True))
I have tested OneOrMore(Group(all_rgbasm_parsers))
with files that include no nested structures, and that gives me the correct results, so I do not think that that code is the problem, though I may be wrong.
It may be that part of the problem is that the nested structures span multiple lines, but Expected end of text, found 'table'
makes me thing otherwise.
I think I might be using Forward wrong.
Any ideas?
Upvotes: 0
Views: 81
Reputation: 951
Found 2 things wrong.
1st, there were some missing one_ofs in the infix_notation
expression = infix_notation(Word(
printables,
exclude_chars=" ** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
),
[
("**", 2, OpAssoc.LEFT),
(one_of("~ + -"), 1, OpAssoc.RIGHT),
(one_of("* / % *= /= %="), 2, OpAssoc.LEFT),
(one_of("<< >> <<= >>="), 2, OpAssoc.LEFT),
(one_of("& | ^ &= |= ^="), 2, OpAssoc.LEFT),
(one_of("+ - += -="), 2, OpAssoc.LEFT),
(one_of("!= == <= >= < >"), 2, OpAssoc.LEFT),
(one_of("&& ||"), 2, OpAssoc.LEFT),
("!", 1, OpAssoc.RIGHT),
])
Then the "endm" was not being consumed resulting in a ParseExcetion.
macro_parser <<= (Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") +
OneOrMore(all_rgbasm_parsers) +
FollowedBy(CaselessKeyword("endm"))) + CaselessKeyword("endm")
Upvotes: 1