Reputation: 1336
I have a question, I am searching for about an hour now. A given ANTLR-lexer rule consists of 2 (or more) sub-rules. The Lexer now produces separate AST-nodes.
Example:
[...]
variable: '$' CamelCaseIdentifier;
CamelCaseIdentifier: ('a'..'z') Identifier*;
Identifier: ('a'..'z' | 'A' .. 'Z' | '0'..'9')+;
[...]
With the given input of [...]$a[...]
the result is ..., $, a, ...
I am looking for a way to tell the lexer, that these rules should not be separated: ..., $a, ...
Could anyone help me out?
Upvotes: 1
Views: 2449
Reputation: 6615
I am a beginner at compilers and Antlr, but from my limited understanding, a upper case (lexer) rule is only for regular expressions. Lower case (parser) rules can also double as lexer rules (see [1]). So it shouldn't matter if variable is upper or lower case, right?
Anyways, I may be wrong, but wouldn't it be simpler to just do this:
[...]
variable: '$' ('a'..'z' | 'A' .. 'Z') ALPHANUM*;
ALPHANUM: ('a'..'z' | 'A' .. 'Z' | '0'..'9');
[...]
?
If you plan on reusing ('a'..'z' | 'A' .. 'Z'), then you should do:
[...]
variable: '$' ALPHA ALPHANUM*;
fragment ALPHA: ('a'..'z' | 'A' .. 'Z')
ALPHANUM: (ALPHA | '0'..'9');
[...]
Apologies if this is completely off base, I am still learning.
Upvotes: 0
Reputation: 170158
Parser rules start with a lowercase letter and lexer rules with an upper case. When you output as an AST, each individual token in a parser rule will become a separate node, so you'll want to make the variable
rule a lexer rule instead of a parser rule:
Variable : '$' CamelCaseIdentifier;
CamelCaseIdentifier : ('a'..'z') Identifier*;
Identifier : ('a'..'z' | 'A' .. 'Z' | '0'..'9')+;
But if you do it like this, the input 123456
will be tokenized as an Identifier
, which is probably not what you want. Besides, the Identifier
rule is better named AlphaNum
. And if you make a fragment rule of it, you make sure the lexer will never produce any AlphaNum
tokens on itself, but will only use AlphaNum
's for other lexer rules (like your CamelCaseIdentifier
rule). If you also want a rule that matches an Identifier
, do something like this:
Variable : '$' (CamelCaseIdentifier | Identifier);
CamelCaseIdentifier : 'a'..'z' AlphaNum*;
Identifier : 'A'..'Z' AlphaNum*;
// a fragment rule can't be used inside parser rules, only in lexer rules
fragment AlphaNum : 'a'..'z' | 'A' .. 'Z' | '0'..'9';
Upvotes: 2
Reputation: 6921
maybe try to uppercase all rule names?
Edited: With the example
grammar Dummy;
prog : VARIABLE*;
VARIABLE: '$' CAMELCASEIDENTIFIER;
CAMELCASEIDENTIFIER: ('a'..'z') IDENTIFIER*;
IDENTIFIER: ('a'..'z' | 'A' .. 'Z' | '0'..'9')+;
WS: (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN; };
Upvotes: 0