ANTLR4 performance issues due to lookahead

Question

I am trying to track down a parsing performance issue which results in a ~40x increase in parse times. The issue is at:

https://github.com/bkiers/Liqp/blob/master/src/main/antlr4/liquid/parser/v4/LiquidParser.g4#L226

Running this as-is over a large file results in parse times around 1900ms, whereas if I drop the unparsed=not_out_end? rule then the parse time is around 50ms.

Running the ANTLR profiler as-is I have:

rule	time	invocations	lookahead	lookahead(max)	ambiguities	errors
block	6334	2	4	3	[]	[]
atom	547	1	3	3	[]	[]
expr	1238604	5835	5835	1	[]	[]
expr	1272883181	13762	114353250	39178	[]	[]
term	866103	7927	7927	1	[]	[]
lookup	821322155	7496	73441151	39182	[]	[]
lookup	399943	3748	3748	1	[]	[]
lookup	397720	3748	3748	1	[]	[]

If I drop the usage of the not_out_end rule:

rule	time	invocations	lookahead	lookahead(max)	ambiguities	errors
block	5943	2	4	3	[]	[]
atom	997	1	3	3	[]	[]
expr	719727	5835	5835	1	[]	[]
expr	4288017	13762	54624	21	[]	[]
term	453338	7927	7927	1	[]	[]
lookup	10262024	7496	52056	20	[]	[]
lookup	204689	3748	3748	1	[]	[]
lookup	205296	3748	3748	1	[]	[]

We can see above that there is almost an 10000x increase in lookaheads.

I've been trying to reduce the # of lookaheads by rewriting the output rule, but with no success - there are terms in not_out_end which seem to trigger a lookahead, and with several of those it results in a combinatorial explosion apparently.

ANTLR4 performance issues due to lookahead

Answers (0)

Related Questions