Reputation: 79
Recently, I encounter a performance problem with my program. Investigation finally points to an issue deep inside in antlr4 which I use to parse SQL. As shows in the code, there is a synchronized block on dfa.states. That block literally caps the parsing performance on a computer with 8 or more cores. I am wondering if anyone has run into this and found a solution?
protected DFAState addDFAState(ATNConfigSet configs) {
/* the lexer evaluates predicates on-the-fly; by this point configs
* should not contain any configurations with unevaluated predicates.
*/
assert !configs.hasSemanticContext;
DFAState proposed = new DFAState(configs);
ATNConfig firstConfigWithRuleStopState = null;
for (ATNConfig c : configs) {
if ( c.state instanceof RuleStopState ) {
firstConfigWithRuleStopState = c;
break;
}
}
if ( firstConfigWithRuleStopState!=null ) {
proposed.isAcceptState = true;
proposed.lexerActionExecutor = ((LexerATNConfig)firstConfigWithRuleStopState).getLexerActionExecutor();
proposed.prediction = atn.ruleToTokenType[firstConfigWithRuleStopState.state.ruleIndex];
}
DFA dfa = decisionToDFA[mode];
synchronized (dfa.states) {
DFAState existing = dfa.states.get(proposed);
if ( existing!=null ) return existing;
DFAState newState = proposed;
newState.stateNumber = dfa.states.size();
configs.setReadonly(true);
newState.configs = configs;
dfa.states.put(newState, newState);
return newState;
}
}
Upvotes: 1
Views: 524
Reputation: 79
After a few days of struggle, I am able to find a solution. Just like Mike Lische said, the synchronized block seems trying to reduce memory footprint. But it has a significant impact to performance on a multi-core computer with heavy SQL parsing workload. I was trying to parse a 100gb+ SQL file generated by mysqldump.
My solution is to create a custom Interpreter with a cloned DFA instead of the static one. The result is almost 10 times better on my 16 core AMD threadripper with CPU usage goes above 95%.
setInterpreter(new LexerATNSimulator(this, _ATN, getDFA(), new PredictionContextCache()));
private DFA[] getDFA() {
DFA[] result = new DFA[_ATN.getNumberOfDecisions()];
for (int i = 0; i < _ATN.getNumberOfDecisions(); i++) {
result[i] = new DFA(_ATN.getDecisionState(i), i);
}
return result;
}
Upvotes: 1
Reputation: 53337
All parsers instances for a given language share the same DFA (it's a static structure) for memory efficiency reasons. However, that requires to make this structure thread safe (parsers can be used in background threads). No way around that.
Upvotes: 0