Reputation: 31
I'm trying to write a grammar for an XML-like language, where we use << instead of < characters. This is a partial snap of the lexer, where TEXT represents the text between (outside) tags:
OPEN : '<<' ;
CLOSE : '>>' ;
TEXT : ~[^<]+ ;
The definition for TEXT above is clearly wrong, because it will stop at the first occurrence of < even when one is not followed by another <. I am looking for a way to define "capture everything until you encounter a <<" but don't include the << as part of the match.
So something like this won't work either:
TEXT : .*? '<<' ;
Is there a way to accomplish that in ANTLR4?
-- TR
Upvotes: 3
Views: 1441
Reputation: 51330
No need for a lookahead here, the following should do the trick:
TEXT : ( ~'<' | '<' ~'<' )+ ;
That is: match a series of non <
characters, or a single <
followed by something else.
By the way, ANTLR's syntax is different for negative character classes. You should write ~[a-z]
instead of [^a-z]
for instance.
You may also want to take a look at the XML example grammar, it uses lexer modes to differentiate tokens inside tags, which may also prove useful for your grammar.
Upvotes: 5