Clark Gaebel
Clark Gaebel

Reputation: 17988

Fixing a bad JSON grammar

I've just started learning about parsing, and I wrote this simple parser in Haskell (using parsec) to read JSON and construct a simple tree for it. I am using the grammar in RFC 4627.

However, when I try parsing the string {"x":1 }, I'm getting the output:

parse error at (line 1, column 8):
unexpected "}"
expecting whitespace character or ","

This only seems to be happening when I have spaces before a closing brace (]) or mustachio (}).

What have I done wrong? If I avoid whitespace before a closing symbol, it works perfectly.

Upvotes: 3

Views: 615

Answers (2)

Tikhon Jelvis
Tikhon Jelvis

Reputation: 68172

A general solution would be to have all your parsers skip trailing whitespace. Check out lexeme (in ParsecToken) in the Parsec docs for a neat way to do this or just whip up a simple version yourself:

 lexeme parser = do result <- parser
                    spaces
                    return result

Then use this function on all of your tokens (like numerical literals). This way you only ever have to worry about the whitespace at the very beginning of an expression.

For more info about ParsecToken and friends, look at the "Lexical Analysis" section of the Parsec docs.

It makes sense to only skip whitespace after a token except at the very beginning where you can skip it manually. You should take this approach even if you end up not using the ParsecToken module.

It seems you already have tok which acts like my lexeme except it consumes whitespace on both sides. Change it to only consume whitespace after the token and just ignore the whitespace at the very beginning of the input manually. That should (ideally :)) fix the problem.

Upvotes: 1

dflemstr
dflemstr

Reputation: 26167

Parsec doesn't do rewinding and backtracking automatically. When you write sepBy member valueSeparator, the valueSeparator consumes white space, so the parser will parse your value like so:

{"x":1 }
[------- object
%        beginObject
 [-]     name
    %    nameSeparator
     %   jvalue
      [- valueSeparator
       X In valueSeparator: unexpected "}"

Legend:
[--]     full match
%        full char match
[--      incomplete match
X        incomplete char match

When the valueSeparator fails, Parsec won't go back and try a different combination of parses, because one character has already matched in valueSeparator.

You have two options to solve your problem:

  1. Since white space is insignificant in JSON, always consume white space after a significant token, never before. So, a tok should only consume white space after the char, so its definition is tok c = char c *> ws ((*>) from Control.Applicative); apply the same rule to all the other parsers. Since you'll never consume white space after having entered the "wrong parser" that way, you won't end up having to back-track.
  2. Use back-tracking in Parsec by adding try in front of parsers that might consume more than one character, and that should rewind their input if they fail.

EDIT: updated ASCII graphic to make more sense.

Upvotes: 6

Related Questions