Reputation: 15
I have a language with keywords like hello
that are only keywords in certain types of sentences. In other types of sentences, these words should be matched as an ID, for example. Here's a super simple grammar that tells the story:
grammar Hello;
file : ( sentence )* ;
sentence : 'hello' ID PERIOD
| INT ID PERIOD;
ID : [a-z]+ ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
PERIOD : '.' ;
I'd like these sentences to be valid:
hello fred.
31 cheeseburgers.
6 hello.
but that last sentence doesn't work in this grammar. The word hello
is a token of type hello
and not of type ID
. It seems like the lexer grabs all the hellos and turns them into tokens of that type.
Here's a crazy way to do it, to explain what I want:
sentence : 'hello' ID PERIOD
| INT crazyID PERIOD;
crazyID : ID | 'hello' ;
but in my real language, there are a lot of keywords like hello to deal with, so, yeah, that way seems crazy.
Is there a reasonable, compact, target-language-independent way to handle this?
Upvotes: 1
Views: 614
Reputation: 5991
A standard way of handling keywords:
file : ( sentence )* EOF ;
sentence : key=( KEYWORD | INT ) id=( KEYWORD | ID ) PERIOD ;
KEYWORD : 'hello' | 'goodbye' ; // list others as alts
PERIOD : '.' ;
ID : [a-z]+ ;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
The seeming ambiguity between the KEYWORD
and ID
rules is resolved based on the KEYWORD
rule being listed before the ID
rule.
In the parser SentenceContext, TerminalNode variables key
and id
will be generated and, on parsing, will effectively hold the matched tokens, allowing easy positional identification.
Upvotes: 3