Starting point for ANTLR4 performance

Question

I've migrated a quite large ANTLR2 grammar to ANTLR4, and reached a step where the output in both grammars is almost identical, apart from a few edge cases. However, some files are extremely long to get parsed (even with SLL prediction mode and BailOutStrategy), so I'm wondering how I could find which rules should be fixed first.

I've already gathered some statistics with Parser#setProfile(), but I don't know how to interpret the results in each DecisionInfo object. Is there any good documentation on how to start optimizing a large ANTLR4 grammar, and find which rabbit to chase first ?

Gilles QUERRET · Accepted Answer

As I didn't know what to look for in the DecisionInfo object, here's what I've found, and helped me improve the parse time by at least an order of magnitude.

First, I enabled profiling on the grammar with org.antlr.v4.runtime.Parser.setProfile(boolean profile), then executed the parser with org.antlr.v4.runtime.Parser.getInterpreter().setPredictionMode(PredictionMode.SLL) on thousands of files, and browsed through the rules with highest prediction time:

Arrays.stream(parser.getParseInfo().getDecisionInfo())
          .filter(decision -> decision.timeInPrediction > 100000000)
          .sorted((d1, d2) -> Long.compare(d2.timeInPrediction, d1.timeInPrediction))
          .forEach(decision -> System.out.println(
                String.format("Time: %d in %d calls - LL_Lookaheads: %d Max k: %d Ambiguities: %d Errors: %d Rule: %s",
                    decision.timeInPrediction / 1000000,
                    decision.invocations, decision.SLL_TotalLook,
                    decision.SLL_MaxLook, decision.ambiguities.size(), 
                    decision.errors.size(), Proparse.ruleNames[Proparse._ATN.getDecisionState(decision.decision).ruleIndex])))

and then on highest max lookahead by using the same lamba except:

filter(decision -> decision.SLL_MaxLook > 50).sorted((d1, d2) -> Long.compare(d2.SLL_MaxLook, d1.SLL_MaxLook))

This gave me 4 rules where most of the time was spent, and in this case that was enough to see what had to be changed (by knowing where to look for problems).

Starting point for ANTLR4 performance

Answers (1)

Related Questions