user14781215
user14781215

Reputation: 35

Antlr4: Skip line when it start with * unless the second char is

In my input, a line start with * is a comment line unless it starts with *+ or *-. I can ignore the comments but need to get the others.

This is my lexer rules:

WhiteSpaces : [ \t]+;
Newlines    : [\r\n]+;
Commnent    : '*' .*? Newlines -> skip ;
SkipTokens  : (WhiteSpaces | Newlines) -> skip;

An example:

* this is a comment line
** another comment line
*+ type value

So, the first two are comment lines, and I can skip it. But I don't know to to define lexer/parser rule that can catch the last line.

Upvotes: 2

Views: 771

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170148

Your SkipTokens lexer rule will never be matched because the rules WhiteSpaces and Newlines are placed before it. See this Q&A for an explanation how the lexer matches tokens: ANTLR Lexer rule only seems to work as part of parser rule, and not part of another lexer rule

For it to work as you expect, do this:

SkipTokens  : (WhiteSpaces | Newlines) -> skip;

fragment WhiteSpaces : [ \t]+;
fragment Newlines    : [\r\n]+;

What a fragment is, check this Q&A: What does "fragment" mean in ANTLR?

Now, for your question. You defined a Comment rule to always end with a line break. This means that there can't be a comment at the end of your input. So you should let a comment either end with a line break or the EOF.

Something like this should do the trick:

COMMENT
 : '*' ~[+\-\r\n] ~[\r\n]* // a '*' must be followed by something other than '+', '-' or a line break
 | '*' ( [\r\n]+ | EOF )   // a '*' is a valid comment if directly followed by a line break, or the EOF
 ;

STAR_MINUS
 : '*-'
 ;

STAR_PLUS
 : '*+'
 ;

SPACES
 : [ \t\r\n]+ -> skip
 ;

This, of course, does not mandate the * to be at the start of the line. If you want that, checkout this Q&A: Handle strings starting with whitespaces

Upvotes: 1

Related Questions