Reputation: 17988
I've just started learning about parsing, and I wrote this simple parser in Haskell (using parsec) to read JSON and construct a simple tree for it. I am using the grammar in RFC 4627.
However, when I try parsing the string {"x":1 }
, I'm getting the output:
parse error at (line 1, column 8): unexpected "}" expecting whitespace character or ","
This only seems to be happening when I have spaces before a closing brace (]) or mustachio (}).
What have I done wrong? If I avoid whitespace before a closing symbol, it works perfectly.
Upvotes: 3
Views: 615
Reputation: 68172
A general solution would be to have all your parsers skip trailing whitespace. Check out lexeme
(in ParsecToken
) in the Parsec docs for a neat way to do this or just whip up a simple version yourself:
lexeme parser = do result <- parser
spaces
return result
Then use this function on all of your tokens (like numerical literals). This way you only ever have to worry about the whitespace at the very beginning of an expression.
For more info about ParsecToken
and friends, look at the "Lexical Analysis" section of the Parsec docs.
It makes sense to only skip whitespace after a token except at the very beginning where you can skip it manually. You should take this approach even if you end up not using the ParsecToken
module.
It seems you already have tok
which acts like my lexeme
except it consumes whitespace on both sides. Change it to only consume whitespace after the token and just ignore the whitespace at the very beginning of the input manually. That should (ideally :)) fix the problem.
Upvotes: 1
Reputation: 26167
Parsec doesn't do rewinding and backtracking automatically. When you write sepBy member valueSeparator
, the valueSeparator
consumes white space, so the parser will parse your value like so:
{"x":1 }
[------- object
% beginObject
[-] name
% nameSeparator
% jvalue
[- valueSeparator
X In valueSeparator: unexpected "}"
Legend:
[--] full match
% full char match
[-- incomplete match
X incomplete char match
When the valueSeparator
fails, Parsec won't go back and try a different combination of parses, because one character has already matched in valueSeparator
.
You have two options to solve your problem:
tok
should only consume white space after the char, so its definition is tok c = char c *> ws
((*>)
from Control.Applicative
); apply the same rule to all the other parsers. Since you'll never consume white space after having entered the "wrong parser" that way, you won't end up having to back-track.try
in front of parsers that might consume more than one character, and that should rewind their input if they fail.EDIT: updated ASCII graphic to make more sense.
Upvotes: 6