Jan van Brügge
Jan van Brügge

Reputation: 772

Parsing optional recursive parser runs in infinite recursion

I'm currently writing a parser for ECMAScript 5 (as a toy). The standard dictates how logical or expressions should be parsed:

<LogicalORExpression> :
    <LogicalANDExpression>
    <LogicalORExpression> || <LogicalANDExpression>

basicly this is equivalent to

<logicalOrExpression> = [<logicalOrExpression> ||] <LogicalAndExpression>

but how should I parse this without running into an infite loop? My current parser obviously does:

logicalOrExpression :: Parser LogicalOrExpression
logicalOrExpression = do
    orExpr <- optional $ do
        e <- logicalOrExpression
        _ <- symbol "||"
        return e
    andExpr <- logicalAndExpression
    case orExpr of
        Just e -> return $ LogicalOrExpression (e, andExpr)
        Nothing -> return $ AndExpression andExpr

Thanks

Upvotes: 0

Views: 434

Answers (4)

Benjamin Hodgson
Benjamin Hodgson

Reputation: 44654

It's easiest to use megaparsec's built-in tools if you need to parse a grammar of operators with precedence and associativity.

expr = makeExprParser term table
    where
        term = literal <|> parenthesised expr
        table = [[InfixL (string "&&" $> And)], [InfixL (string "||" $> Or)]]

For suitable definitions of literal and parenthesised, this'll parse a grammar of literal expressions composed with left-associative infix && and || operators, with && having greater precedence than ||. Megaparsec takes care of the tedious work of generating an LL(k) parser, and produces correct (left-associative, in this instance) parse trees.

Of course JavaScript's expression grammar is much larger than two operators. This example can be straightforwardly extended to include (eg) unary prefix operators like !, postfix function calls, etc. See the module's documentation.

Upvotes: 3

Jan van Br&#252;gge
Jan van Br&#252;gge

Reputation: 772

As I want to stay true to the spec in terms of the AST produced, I decided to switch to an Earley based parser and not parser combinators, as Earley's algorithm can handle left recursion.

If I would be OK with flattening the grammar, I would use the answer of Benjamin Hodgson

Upvotes: -1

chi
chi

Reputation: 116174

That grammar looks equivalent to

<LogicalORExpression> :
    <LogicalANDExpression>
    <LogicalANDExpression> || <LogicalORExpression>

which becomes

<LogicalORExpression> :
    <LogicalANDExpression> [|| <LogicalORExpression>]

In general, you need to rewrite the grammar in (roughly) LL(1) form, if possible.

Upvotes: 1

John F. Miller
John F. Miller

Reputation: 27225

The empty string matches this parser, which I believe, causes infinite recursion in Magaparsec. I think you are missing a "term" or "boolean" somewhere in your function. If I wrote True || False what would capture the first "True"

Upvotes: 0

Related Questions