Reputation: 772
I'm currently writing a parser for ECMAScript 5 (as a toy). The standard dictates how logical or expressions should be parsed:
<LogicalORExpression> :
<LogicalANDExpression>
<LogicalORExpression> || <LogicalANDExpression>
basicly this is equivalent to
<logicalOrExpression> = [<logicalOrExpression> ||] <LogicalAndExpression>
but how should I parse this without running into an infite loop? My current parser obviously does:
logicalOrExpression :: Parser LogicalOrExpression
logicalOrExpression = do
orExpr <- optional $ do
e <- logicalOrExpression
_ <- symbol "||"
return e
andExpr <- logicalAndExpression
case orExpr of
Just e -> return $ LogicalOrExpression (e, andExpr)
Nothing -> return $ AndExpression andExpr
Thanks
Upvotes: 0
Views: 434
Reputation: 44654
It's easiest to use megaparsec
's built-in tools if you need to parse a grammar of operators with precedence and associativity.
expr = makeExprParser term table
where
term = literal <|> parenthesised expr
table = [[InfixL (string "&&" $> And)], [InfixL (string "||" $> Or)]]
For suitable definitions of literal
and parenthesised
, this'll parse a grammar of literal expressions composed with left-associative infix &&
and ||
operators, with &&
having greater precedence than ||
. Megaparsec takes care of the tedious work of generating an LL(k) parser, and produces correct (left-associative, in this instance) parse trees.
Of course JavaScript's expression grammar is much larger than two operators. This example can be straightforwardly extended to include (eg) unary prefix operators like !
, postfix function calls, etc. See the module's documentation.
Upvotes: 3
Reputation: 772
As I want to stay true to the spec in terms of the AST produced, I decided to switch to an Earley based parser and not parser combinators, as Earley's algorithm can handle left recursion.
If I would be OK with flattening the grammar, I would use the answer of Benjamin Hodgson
Upvotes: -1
Reputation: 116174
That grammar looks equivalent to
<LogicalORExpression> :
<LogicalANDExpression>
<LogicalANDExpression> || <LogicalORExpression>
which becomes
<LogicalORExpression> :
<LogicalANDExpression> [|| <LogicalORExpression>]
In general, you need to rewrite the grammar in (roughly) LL(1) form, if possible.
Upvotes: 1
Reputation: 27225
The empty string matches this parser, which I believe, causes infinite recursion in Magaparsec. I think you are missing a "term" or "boolean" somewhere in your function. If I wrote True || False
what would capture the first "True"
Upvotes: 0