Reputation: 1328
I'm trying to implement js parser in haskell. But I'm stuck with automatic semicolon insertion. I have created test project to play around with problem, but I can not figure out how to solve the problem.
In my test project program is a list of expressions (unary or binary):
data Program = Program [Expression]
data Expression
= UnaryExpression Number
| PlusExpression Number Number
Input stream is a list of tokens:
data Token
= SemicolonToken
| NumberToken Number
| PlusToken
I want to parse inputs like these:
1;
- Unary expression
1 + 2;
- Binary expression
1; 2 + 3;
- Two expressions (unary and binary)
1 2 + 3;
- Same as previous input, but first semicolon is missing. So parser consume token 1, but token 2 is not allowed by any production of grammar (next expected token is semicolon or plus). Rule of automatic semicolon insertion says that in this case a semicolon is automatically inserted before token 2.
So, what is the most elegant way to implement such parser behavior.
Upvotes: 2
Views: 238
Reputation: 183978
You have
expression = try unaryExpression <|> plusExpression
but that doesn't work, since a UnaryExpression
is a prefix of a PlusExpression
. So for
input2 = [NumberToken Number1, PlusToken, NumberToken Number1, SemicolonToken]
the parser happily parses the first NumberToken
and automatically adds a semicolon, since the next token is a PlusToken
and not a SemicolonToken
. Then it tries to parse the next Expression
, but the next is a PlusToken
, no Expression
can start with that.
Change the order in which the parsers are tried,
expression = try plusExpression <|> unaryExpression
and it will first try to parse a PlusExpression
, and only when that fails resort to the shorter parse of a UnaryExpression
.
Upvotes: 1