Victor Tatai
Victor Tatai

Reputation: 331

ANTLR4 performance issues due to lookahead

I am trying to track down a parsing performance issue which results in a ~40x increase in parse times. The issue is at:

https://github.com/bkiers/Liqp/blob/master/src/main/antlr4/liquid/parser/v4/LiquidParser.g4#L226

Running this as-is over a large file results in parse times around 1900ms, whereas if I drop the unparsed=not_out_end? rule then the parse time is around 50ms.

Running the ANTLR profiler as-is I have:

rule time invocations lookahead lookahead(max) ambiguities errors
block 6334 2 4 3 [] []
atom 547 1 3 3 [] []
expr 1238604 5835 5835 1 [] []
expr 1272883181 13762 114353250 39178 [] []
term 866103 7927 7927 1 [] []
lookup 821322155 7496 73441151 39182 [] []
lookup 399943 3748 3748 1 [] []
lookup 397720 3748 3748 1 [] []

If I drop the usage of the not_out_end rule:

rule time invocations lookahead lookahead(max) ambiguities errors
block 5943 2 4 3 [] []
atom 997 1 3 3 [] []
expr 719727 5835 5835 1 [] []
expr 4288017 13762 54624 21 [] []
term 453338 7927 7927 1 [] []
lookup 10262024 7496 52056 20 [] []
lookup 204689 3748 3748 1 [] []
lookup 205296 3748 3748 1 [] []

We can see above that there is almost an 10000x increase in lookaheads.

I've been trying to reduce the # of lookaheads by rewriting the output rule, but with no success - there are terms in not_out_end which seem to trigger a lookahead, and with several of those it results in a combinatorial explosion apparently.

Upvotes: 3

Views: 52

Answers (0)

Related Questions