Reputation: 4523
How to handle nested comments in antlr4 lexer? ie I need to count the number of "/*" inside this token and close only after the same number of "*/" have been received. As an example, the D language has such nested comments as "/+ ... +/"
For example, the following lines should be treated as one block of comments:
/* comment 1
comment 2
/* comment 3
comment 4
*/
// comment 5
comment 6
*/
My current code is the following, and it does not work on the above nested comment:
COMMENT : '/*' .*? '*/' -> channel(HIDDEN)
;
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' -> channel(HIDDEN)
;
Upvotes: 6
Views: 5457
Reputation: 4585
Terence Parr has these two lexer lines in his Swift Antlr4 grammar for lexing out nested comments:
COMMENT : '/*' (COMMENT|.)*? '*/' -> channel(HIDDEN) ;
LINE_COMMENT : '//' .*? '\n' -> channel(HIDDEN) ;
Upvotes: 17
Reputation: 151
COMMENT_NEST
: '/*'
( ('/'|'*'+)? ~[*/] | COMMENT_NEST | COMMENT_INL )*?
('/'|'*'+?)?
'*/'
;
COMMENT_INL
: '//' ( COMMENT_INL | ~[\n\r] )*
;
Upvotes: 0
Reputation: 11
Works for Antlr3.
Allows nested comments and '*' within a comment.
fragment
F_MultiLineCommentTerm
:
( {LA(1) == '*' && LA(2) != '/'}? => '*'
| {LA(1) == '/' && LA(2) == '*'}? => F_MultiLineComment
| ~('*')
)*
;
fragment
F_MultiLineComment
:
'/*'
F_MultiLineCommentTerm
'*/'
;
H_MultiLineComment
: r= F_MultiLineComment
{ $channel=HIDDEN;
printf(stder,"F_MultiLineComment[\%s]",$r->getText($r)->chars);
}
;
Upvotes: 1
Reputation: 509
I'm using:
COMMENT: '/*' ('/'*? COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/' -> skip;
This forces any /*
inside a comment to be the beginning of a nested comment and similarly with */
. In other words, there's no way to recognize /*
and */
other than at the beginning and end of the rule COMMENT
.
This way, something like /* /* /* */ a */
would not be recognized entirely as a (bad) comment (mismatched /*
s and */
s), as it would if using COMMENT: '/*' (COMMENT|.)*? '*/' -> skip;
, but as /
, followed by *
, followed by correct nested comments /* /* */ a */
.
Upvotes: 4
Reputation: 53572
I can give you an ANTLR3 solution, which you can adjust to work in ANTLR4:
I think you can use a recursive rule invocation. Make a non-greedy comment rule for /* ... */ which calls itself. That should allow for unlimited nesting without having to count opening + closing comment markers:
COMMENT option { greedy = false; }:
('/*' ({LA(1) == '/' && LA(2) == '*'} => COMMENT | .) .* '*/') -> channel(HIDDEN)
;
or maybe even:
COMMENT option { greedy = false; }:
('/*' .* COMMENT? .* '*/') -> channel(HIDDEN)
;
I'm not sure if ANTLR properly chooses the right path depending on any char or the comment introducer. Try it out.
Upvotes: 0