Reputation: 3426
I have to create a program that counts lines of code ignoring those inside a comment. I'm a newbie working with Antlr, and after trying a lot, the nearest I came to a solution is this erroneous grammar:
grammar Comments;
comment : startc content endc;
startc : '/*';
endc : '*/';
content : newline | contenttext;
contenttext : CONTENTCHARS+;
newline : '\r\n';
CONTENTCHARS
: ~'*' '/'
| ~'/' .
;
WS : [ \r\t]+ -> skip;
If I try with /*hello\r\nworld*/
the parser recognizes this, which is erroneous:
In order to count lines, the parser needs to detect newline characters, inside and outside multiline comments. I think my problem is that I don't know how to say "match everything inside /*
and */
except \r\n
.
Please, can you point me in the right direction? Any help will be appreciated.
Upvotes: 1
Views: 3315
Reputation: 1003
Let's simplify your grammar! In the grammar we will ignore whitespace characters and comments at the lexer stage (and the unwanted newlines at the same time!). For example the COMMENT
section will match one line comments or multi-line comments and just skip them!
Next, we will introduce counter
variable for counting NEWLINE
tokens that are used only in content
grammar rule (because COMMENT
token is skipped so the NEWLINE
token in it!).
Whenever we encounter a NEWLINE
token we increment the counter
variable.
grammar Comments;
@lexer::members {
int counter = 0;
}
WS : [ \r\t]+ -> skip;
COMMENT : '/*' .*? '*/' NEWLINE? -> skip;
TEXT : [a-zA-Z0-9]+;
NEWLINE : '\r'? '\n' { {System.out.println("Newlines so far: " + (++counter)); } };
content: (TEXT | COMMENT | NEWLINE )* EOF;
Upvotes: 3