Reputation: 354
This is kind of duplicate to FParsec identifiers vs keywords, but the answer there isn't helpful in my case.
A common problem in writing parsers for programming languages is handling syntax errors that occur when keywords are misplaced in the code.
I have seen two approaches to this problem in FParsec, but none is really satisfying.
Approach 1: Write a grammar that allows choices between keywords and other literals where a conflict is possible (this is the accepted solution of the above-mentioned duplicate).
In my grammar, there are contexts where keywords never occur. So, rewriting my grammar to allow choices between literals and keywords in these contexts is not really an option. Otherwise, I would allow my parser to accept input that is actually a syntax error.
Approach 2: Identify literals that might conflict with keywords and let their parsers fail if they consume input that was a keyword.
There are two negative side effects of this approach I want to demonstrate in the following example:
open FParsec
type Ast =
| Var of string
| Number of string
| Assignment of Ast * Ast
let assign = skipString ":=" >>. spaces
let number = regex "\d+" <?> "number" |>> Ast.Number
let var =
regex "[a-z]\w*"
.>> spaces
<?> "variable"
>>= (fun s ->
if List.contains s ["for"; "if"; "else"] then // check if s is a keyword
fail "variable expected" // fail if s is a keyword
else
preturn (Ast.Var s) // return Ast.Var s otherwise
)
let varOrNumber = choice [ var ; number ]
let assignment = (var .>> assign) .>>. varOrNumber |>> Ast.Assignment
let parser = assignment
let p1 = run parser "x := y"
printfn "%O" p1
let p2 = run parser "x := 42"
printfn "%O" p2
printfn "---------"
let p3 = run parser "# := y"
printfn "failure p3 (correct message and position)\n%O" p3
printfn "---------"
let p4 = run parser "for := y"
printfn "failure p4 (wrong message and position)\n%O" p4
printfn "---------"
let p5 = run parser "x := #"
printfn "failure p5 (correct message and position)\n%O" p5
printfn "---------"
let p6 = run parser "x := for"
printfn "failure p6 (wrong message and position)\n%O" p6
This will output
Success: Assignment (Var "x", Var "y")
Success: Assignment (Var "x", Number "42")
---------
failure p3 (correct message and position)
Failure:
Error in Ln: 1 Col: 1
# := y
^
Expecting: variable
---------
failure p4 (wrong message and position)
Failure:
Error in Ln: 1 Col: 5
for := y
^
variable expected
---------
failure p5 (correct message and position)
Failure:
Error in Ln: 1 Col: 6
x := #
^
Expecting: number or variable
---------
failure p6 (wrong message and position)
Failure:
Error in Ln: 1 Col: 9
x := for
^
Note: The error occurred at the end of the input stream.
variable expected
In this example, the parser var
has been modified to fail if the consumed input was some keyword.
Note the disadvantages of this approach:
p3
and p4
, the grammar expects a variable
, while in p5
and p6
, the grammar expects either a variable
or a number
. While p3
and p5
output the correct FParsec error messages, p4
and p6
output the custom error message that is (only accidentally) correct in the context of p4
.fun s
failing if s
is a keyword consumes the input of the keyword so the position of the error shifts to the first character after the misplaced keyword. This position has nothing to do with the error because, actually, the error should occur before consuming the keyword, not thereafter. In my opinion, an erroneous input should never be consumed by a parser, even if it fails.Is there a way to write an FParsec parser that would correctly identify misplaced keywords in the input and still preserve the original FPArsec error messages and positions?
Upvotes: 1
Views: 58
Reputation: 17038
The FParsec documentation describes a handy parser called resultSatisfies
that backtracks the way you want:
let resultSatisfies predicate msg (p: Parser<_,_>) : Parser<_,_> =
let error = messageError msg
fun stream ->
let state = stream.State
let reply = p stream
if reply.Status <> Ok || predicate reply.Result then reply
else
stream.BacktrackTo(state) // backtrack to beginning
Reply(Error, error)
In your case, I think you could use it like this:
let var =
regex "[a-z]\w*"
.>> spaces
<?> "variable"
|> resultSatisfies (fun s ->
List.contains s ["for"; "if"; "else"] |> not) "variable expected"
|>> Ast.Var
I haven't tried to vary the error message by context. I think that's something you would have to work on yourself.
Output is:
Success: Assignment (Var "x", Var "y")
Success: Assignment (Var "x", Number "42")
---------
failure p3 (correct message and position)
Failure:
Error in Ln: 1 Col: 1
# := y
^
Expecting: variable
---------
failure p4 (wrong message and position)
Failure:
Error in Ln: 1 Col: 1
for := y
^
variable expected
---------
failure p5 (correct message and position)
Failure:
Error in Ln: 1 Col: 6
x := #
^
Expecting: number or variable
---------
failure p6 (wrong message and position)
Failure:
Error in Ln: 1 Col: 6
x := for
^
Expecting: number
Other error messages:
variable expected
Upvotes: 2