Reputation: 331
I am trying to track down a parsing performance issue which results in a ~40x increase in parse times. The issue is at:
https://github.com/bkiers/Liqp/blob/master/src/main/antlr4/liquid/parser/v4/LiquidParser.g4#L226
Running this as-is over a large file results in parse times around 1900ms, whereas if I drop the unparsed=not_out_end?
rule then the parse time is around 50ms.
Running the ANTLR profiler as-is I have:
rule | time | invocations | lookahead | lookahead(max) | ambiguities | errors |
---|---|---|---|---|---|---|
block | 6334 | 2 | 4 | 3 | [] | [] |
atom | 547 | 1 | 3 | 3 | [] | [] |
expr | 1238604 | 5835 | 5835 | 1 | [] | [] |
expr | 1272883181 | 13762 | 114353250 | 39178 | [] | [] |
term | 866103 | 7927 | 7927 | 1 | [] | [] |
lookup | 821322155 | 7496 | 73441151 | 39182 | [] | [] |
lookup | 399943 | 3748 | 3748 | 1 | [] | [] |
lookup | 397720 | 3748 | 3748 | 1 | [] | [] |
If I drop the usage of the not_out_end
rule:
rule | time | invocations | lookahead | lookahead(max) | ambiguities | errors |
---|---|---|---|---|---|---|
block | 5943 | 2 | 4 | 3 | [] | [] |
atom | 997 | 1 | 3 | 3 | [] | [] |
expr | 719727 | 5835 | 5835 | 1 | [] | [] |
expr | 4288017 | 13762 | 54624 | 21 | [] | [] |
term | 453338 | 7927 | 7927 | 1 | [] | [] |
lookup | 10262024 | 7496 | 52056 | 20 | [] | [] |
lookup | 204689 | 3748 | 3748 | 1 | [] | [] |
lookup | 205296 | 3748 | 3748 | 1 | [] | [] |
We can see above that there is almost an 10000x increase in lookaheads
.
I've been trying to reduce the # of lookaheads by rewriting the output
rule, but with no success - there are terms in not_out_end
which seem to trigger a lookahead, and with several of those it results in a combinatorial explosion apparently.
Upvotes: 3
Views: 52