Reputation: 485
I'm using parsimonious (python PEG parser library) to parse text that looks like this:
text = """
block block_name_0
{
foo
}
block block_name_1
{
bar
}
"""
It is a series of blocks with a simple body requirement (must be alphanum) that make up the whole text. Here's the grammar:
grammar = Grammar(r"""
file = block+
block = _ "block" _ alphanum _ start_brace _ block_body _ end_brace _
block_body = alphanum+
alphanum = ~"[_A-z0-9]+"
_ = ~"[\\n\\s]*"
start_brace = "{"
end_brace = "}"
""")
print (grammar.parse(text))
The problem I'm having is I get a useless error message if there's a parsing error in any block after the first one. To give an example, consider the following text:
text = """
block block_name_0
{
!foo
}
block block_name_1
{
bar
}
"""
This gives a useful error message:
[omitted stack trace]
File "/lib/parsimonious/expressions.py", line 127, in match
raise error
parsimonious.exceptions.ParseError: Rule 'block_body' didn't match at '!foo
}
However, if I have the following text:
text = """
block block_name_0
{
foo
}
block block_name_1
{
!bar
}
"""
I get this error:
File "/lib/parsimonious/expressions.py", line 112, in parse
raise IncompleteParseError(text, node.end, self)
parsimonious.exceptions.IncompleteParseError: Rule 'file' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with 'block block_name_1
{' (line 7, column 1).
It looks like it matches the first instances of the sequence (the first block), but when it fails on the second block it doesn't consider the whole thing as a failure, which is what I want it to do. I want it to give me a similar error as block 0 so I can know exactly what went wrong with the block, not just that the whole block couldn't be parsed.
Any help would be greatly appreciated!
Upvotes: 0
Views: 468
Reputation: 943
Not an answer for parsimonious but for good error reporting support I would suggest you to try textX or directly its underlying PEG parser Arpeggio (disclaimer: I'm author of these libs).
Using textX:
from textx.metamodel import metamodel_from_str
grammar = """
Program: blocks+=Block ;
Block:
'block' name=ID '{'
body=Body
'}'
;
Body: ID+ ;
"""
text = """
block block_name_0
{
foo
}
block block_name_1
{
!bar
}
"""
mm = metamodel_from_str(grammar)
program = mm.model_from_str(text)
textX/Arpeggio will parse as far as it can and pinpoint the exact location where the error is:
textx.exceptions.TextXSyntaxError:
Expected ID at position (9, 5) => 'e_1 { *!bar } '.
With textX you also get AST for free so you can for example do:
for block in program.blocks:
print(block.name, ':', block.body)
And for debugging/investigation purposes you also have a nice visualization of grammars and models.
Upvotes: 0