Reputation: 449
I defined the following simple left recursive expression grammar for antlr4:
grammar BugExample;
// Rule Definitions
value: expression EOF ;
real:
'-'? CONSTANT #constantReal |
FLOAT #variableReal
;
variable: IDENTIFIER ;
expression: // Precedence (highest to lowest)
real #realExpression |
variable #variableExpression |
// expression '!' #factorialExpression |
'-' expression #inversionExpression
;
// Token Definitions
FRACTION: '.' ('0'..'9')* '1'..'9' ;
CONSTANT: 'e' | 'pi' ;
FLOAT: INTEGER FRACTION? ('e' INTEGER)? ;
IDENTIFIER: ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')* ;
SPACE: (' '|'\t'|'\r'|'\n')+ -> channel(HIDDEN) ;
fragment
NATURAL: '1'..'9' ('0'..'9')* ;
fragment
INTEGER: '0' | '-'? NATURAL ;
Notice the commented out factorial expression in the expression types. Also notice that the definition of a FLOAT token allows for negative values so a negative real expression should take precedence over an inversion expression. With the factorial expression commented out, the generated JS parser does indeed parse a negative constant '-e' correctly as a real expression. However, if we uncomment the factorial expression and regenerate the parser, the '-e' is suddenly parsed as an inversion expression. Here is the test code showing it:
'use strict';
var language = require('../BugExample');
var testCase = require('nodeunit').testCase;
module.exports = testCase({
'Test Parser': function(test) {
var testValues = ['5.27e-15', '-5.3e22','e', '-e', 'expo', '-expo'];
var expectedResults = [
'RealExpressionContext', // positive real number
'RealExpressionContext', // negative real number
'RealExpressionContext', // positive real constant
'RealExpressionContext', // positive real constant
'VariableExpressionContext', // variable value
'InversionExpressionContext' // negative variable value
];
test.expect(testValues.length);
for (var i = 0; i < testValues.length; i++) {
var value = testValues[i];
console.log('\nTesting: ' + value);
var expression = language.parseValue(value).getChild(0);
test.strictEqual(expression.constructor.name, expectedResults[i]);
}
test.done();
}
});
It turns out that adding any left recursive sub-rule type listed in the "Definitive Antlr 4 Reference" that begins with the expression e.g. "binary", "ternary", and "unary suffix" expressions, will cause this problem. I have only verified this for the generated JS parser. When I look at the generated parser code it appears that the order of the case blocks in the expression() function get randomized when the problem occurs whereas they are in precedence order when the factorial expression is commented out. Not sure if this is the cause or not, the code is too complex for me to understand ;-)
I placed the JavaScript project showing this example out at GitHub: https://github.com/derknorton/antlr4-bug-example
To test it out do the following:
git clone https://github.com/derknorton/antlr4-bug-example
cd antlr4-bug-example
npm install
grunt generate build
# it should work correctly
# then edit test/TestBugExample.js to remove commented factorial expression
grunt generate build
# it should now show the problem
Hopefully I have provided enough detail for the antlr4 experts to determine the problem. Any help will be greatly appreciated.
Upvotes: 0
Views: 504
Reputation: 449
This problem turned out to be with a earlier versions of the antlr 4 parser generator. My Grunt.js file was using the grunt-antlr4
task which hasn't been updated to use the latest version of antlr4. It was using version 4.5.1. The problem seems to have been fixed after that version. Full details of this realization are documented here: https://github.com/antlr/antlr4/issues/2201
Upvotes: 1