amedina
amedina

Reputation: 3426

Antlr4: How can I match end of lines inside multiline comments?

I have to create a program that counts lines of code ignoring those inside a comment. I'm a newbie working with Antlr, and after trying a lot, the nearest I came to a solution is this erroneous grammar:

grammar Comments;
comment         :   startc content endc;
startc          :   '/*';
endc            :   '*/';
content         :   newline | contenttext;
contenttext     :   CONTENTCHARS+;
newline         :   '\r\n';
CONTENTCHARS
    :   ~'*' '/'
    |   ~'/' .
    ;
WS              :   [ \r\t]+ -> skip;

If I try with /*hello\r\nworld*/ the parser recognizes this, which is erroneous:

enter image description here

In order to count lines, the parser needs to detect newline characters, inside and outside multiline comments. I think my problem is that I don't know how to say "match everything inside /* and */ except \r\n.

Please, can you point me in the right direction? Any help will be appreciated.

Upvotes: 1

Views: 3315

Answers (1)

quepas
quepas

Reputation: 1003

Solution

Let's simplify your grammar! In the grammar we will ignore whitespace characters and comments at the lexer stage (and the unwanted newlines at the same time!). For example the COMMENT section will match one line comments or multi-line comments and just skip them!

Next, we will introduce counter variable for counting NEWLINE tokens that are used only in content grammar rule (because COMMENT token is skipped so the NEWLINE token in it!).

Whenever we encounter a NEWLINE token we increment the counter variable.

grammar Comments;

@lexer::members {
    int counter = 0;
}

WS : [ \r\t]+ -> skip;
COMMENT : '/*' .*? '*/' NEWLINE? -> skip;
TEXT : [a-zA-Z0-9]+;
NEWLINE : '\r'? '\n' { {System.out.println("Newlines so far: " + (++counter)); } };

content: (TEXT | COMMENT | NEWLINE )* EOF;

Upvotes: 3

Related Questions