Antlr4: How can I match end of lines inside multiline comments?

Question

I have to create a program that counts lines of code ignoring those inside a comment. I'm a newbie working with Antlr, and after trying a lot, the nearest I came to a solution is this erroneous grammar:

grammar Comments;
comment         :   startc content endc;
startc          :   '/*';
endc            :   '*/';
content         :   newline | contenttext;
contenttext     :   CONTENTCHARS+;
newline         :   '
';
CONTENTCHARS
    :   ~'*' '/'
    |   ~'/' .
    ;
WS              :   [ 
	]+ -> skip;

If I try with /*hello world*/ the parser recognizes this, which is erroneous:

In order to count lines, the parser needs to detect newline characters, inside and outside multiline comments. I think my problem is that I don't know how to say "match everything inside /* and */ except .

Please, can you point me in the right direction? Any help will be appreciated.

quepas · Accepted Answer

Solution

Let's simplify your grammar! In the grammar we will ignore whitespace characters and comments at the lexer stage (and the unwanted newlines at the same time!). For example the COMMENT section will match one line comments or multi-line comments and just skip them!

Next, we will introduce counter variable for counting NEWLINE tokens that are used only in content grammar rule (because COMMENT token is skipped so the NEWLINE token in it!).

Whenever we encounter a NEWLINE token we increment the counter variable.

grammar Comments;

@lexer::members {
    int counter = 0;
}

WS : [ 
	]+ -> skip;
COMMENT : '/*' .*? '*/' NEWLINE? -> skip;
TEXT : [a-zA-Z0-9]+;
NEWLINE : '
'? '
' { {System.out.println("Newlines so far: " + (++counter)); } };

content: (TEXT | COMMENT | NEWLINE )* EOF;

Antlr4: How can I match end of lines inside multiline comments?

Answers (1)

Solution

Related Questions