Stacky
Stacky

Reputation: 15

How to parse keywords as normal words some of the time in ANTLR4

I have a language with keywords like hello that are only keywords in certain types of sentences. In other types of sentences, these words should be matched as an ID, for example. Here's a super simple grammar that tells the story:

grammar Hello;

file : ( sentence )* ;
sentence : 'hello' ID PERIOD
         | INT ID PERIOD;

ID  : [a-z]+ ;
INT : [0-9]+ ;
WS  : [ \t\r\n]+ -> skip ;
PERIOD : '.' ;

I'd like these sentences to be valid:

hello fred.
31 cheeseburgers.
6 hello.

but that last sentence doesn't work in this grammar. The word hello is a token of type hello and not of type ID. It seems like the lexer grabs all the hellos and turns them into tokens of that type.

Here's a crazy way to do it, to explain what I want:

sentence : 'hello' ID PERIOD
         | INT crazyID PERIOD;

crazyID : ID | 'hello' ;

but in my real language, there are a lot of keywords like hello to deal with, so, yeah, that way seems crazy.

Is there a reasonable, compact, target-language-independent way to handle this?

Upvotes: 1

Views: 614

Answers (1)

GRosenberg
GRosenberg

Reputation: 5991

A standard way of handling keywords:

file     : ( sentence )* EOF ;
sentence : key=( KEYWORD | INT ) id=( KEYWORD | ID ) PERIOD ;

KEYWORD : 'hello' | 'goodbye' ; // list others as alts
PERIOD  : '.' ;
ID      : [a-z]+ ;
INT     : [0-9]+ ;
WS      : [ \t\r\n]+ -> skip ;

The seeming ambiguity between the KEYWORD and ID rules is resolved based on the KEYWORD rule being listed before the ID rule.

In the parser SentenceContext, TerminalNode variables key and id will be generated and, on parsing, will effectively hold the matched tokens, allowing easy positional identification.

Upvotes: 3

Related Questions