T.R.
T.R.

Reputation: 31

ANTLR V4 lexer lookahead regex

I'm trying to write a grammar for an XML-like language, where we use << instead of < characters. This is a partial snap of the lexer, where TEXT represents the text between (outside) tags:

OPEN  : '<<' ;
CLOSE : '>>' ;
TEXT  : ~[^<]+ ;

The definition for TEXT above is clearly wrong, because it will stop at the first occurrence of < even when one is not followed by another <. I am looking for a way to define "capture everything until you encounter a <<" but don't include the << as part of the match.

So something like this won't work either:

TEXT  : .*? '<<' ;

Is there a way to accomplish that in ANTLR4?

-- TR

Upvotes: 3

Views: 1441

Answers (1)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

No need for a lookahead here, the following should do the trick:

TEXT  : ( ~'<' | '<' ~'<' )+ ;

That is: match a series of non < characters, or a single < followed by something else.

By the way, ANTLR's syntax is different for negative character classes. You should write ~[a-z] instead of [^a-z] for instance.

You may also want to take a look at the XML example grammar, it uses lexer modes to differentiate tokens inside tags, which may also prove useful for your grammar.

Upvotes: 5

Related Questions