Reputation: 690
I am trying to match complex numbers using different notations, one of them using the cis
function as such : MODULUS cis
PHASE
The problem is that my identifier rule matches the cis
as well as the start of the number following it and since it's bigger than the CIS
token itself it always returns an identifier token type. How could i avoid that ?
Here's the grammar :
grammar Sandbox;
input : number? CIS UNSIGNED
| IDENTIFIER
;
number : FLOAT
| UFLOAT
| UINT
| INT
;
fragment DIGIT : [0-9] ;
UFLOAT : UINT (DOT UINT? | 'f') ;
FLOAT : SUB UFLOAT ;
UINT : DIGITS ;
INT : SUB UINT ;
UNSIGNED : UFLOAT
| UINT
;
DIGITS : DIGIT+ ;
// Specific lexer rules
CIS : 'cis' ;
SUB : '-' ;
DOT : '.' ;
WS : [ \t]+ -> skip ;
NEWLINE : '\r'? '\n' ;
IDENTIFIER : [a-zA-Z_]+[a-zA-Z0-9_]* ; // has to be after complex so i or cis doesn't match this first
Edit:
The input i was trying to parse with is the complex 1+i
but using it's respective modulus and phase like this : 1.4142135623730951cis0.7853981633974483
And my actual problem is that the IDENTIFIER rule matches cis0
instead of just matching the CIS lexer rule even though it's defined before it.
I vaguely know that ANTLR chooses the rule based on the biggest match, but in this case i want to avoid that =o.
Upvotes: 3
Views: 850
Reputation: 53345
I see two solutions here:
COMPLEX: (FLOAT | UFLOAT | UINT | INT) WS* CIS WS* UNSIGNED;
which will be longer than an identifier or the pur CIS keyword (and hence matched first).
cis
secquence is a keyword, when it follows a digit (with optional whitespaces between them), right? So, you could do a lookback (LA(-1)
in your predicate to reject cis
as identifier if that condition is true.I'd prefer solution 1, because the convention is that single entities (and a complex number is, like a float number or a string, a single logicial entity) are match completely in a lexer rule, not in a parser rule.
Upvotes: 3
Reputation: 690
I'm just putting this here because i think this could be a potential solution, although i'd prefer not having to use semantic predicates because it ties my grammar to a target/specific language =/ (I never used them before so i'm not sure if there's any other caveats too):
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]* { identifierIsNotReserved() }?;
And then we just need to implement the identifierIsNotReserved
method to check if the identifier rule consumed a reserved keyword, and if so prevent the rule from being applied. And i quote:
A semantic predicate is a block of arbitrary code in the target language surrounded by {...}?, which evaluates to a boolean value. If the returned value is false, the lexer rule is skipped.
Edit: Forgot to add the reference to where i found this, here it is : https://riptutorial.com/antlr/example/11237/actions-and-semantic-predicates
Upvotes: 0