R71
R71

Reputation: 4523

how to handling nested comments in antlr lexer

How to handle nested comments in antlr4 lexer? ie I need to count the number of "/*" inside this token and close only after the same number of "*/" have been received. As an example, the D language has such nested comments as "/+ ... +/"

For example, the following lines should be treated as one block of comments:

/* comment 1
   comment 2
   /* comment 3
      comment 4
   */
   // comment 5
   comment 6
*/

My current code is the following, and it does not work on the above nested comment:

COMMENT : '/*' .*? '*/' -> channel(HIDDEN)
        ;
LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n'  -> channel(HIDDEN)
        ;

Upvotes: 6

Views: 5457

Answers (5)

mikebridge
mikebridge

Reputation: 4585

Terence Parr has these two lexer lines in his Swift Antlr4 grammar for lexing out nested comments:

COMMENT : '/*' (COMMENT|.)*? '*/' -> channel(HIDDEN) ;
LINE_COMMENT  : '//' .*? '\n' -> channel(HIDDEN) ;

Upvotes: 17

Dennis Ashley
Dennis Ashley

Reputation: 151

  1. This will handle : '/*/*/' and '/*.../*/'where the comment body is '/' and '.../' respectively.
  2. Multiline comments will not nest inside single line comments, therefore you cannot start nor begin a multiline comment inside a single line comment.
    • This is not a valid comment: '/* // */'.
    • You need a newline to end the single line comment before the '*/' can be consumed to end the multiline comment.
    • This is a valid comment: '/* // */ \n /*/'.
    • The comment body is: ' // */ \n /'. As you can see the complete single line comment is included in the body of the multiline comment.
  3. Although '/*/' can end a multiline comment if the preceding character is '*', the comment will end on the first '/' and remaining '*/' will need to end a nested comment otherwise there is a error. The shortest path wins, this is non-greedy!
    • This is not a valid comment /****/*/
    • This is a valid comment /*/****/*/, the comment body is /****/, which is itself a nested comment.
  4. The prefix and suffix will never be matched in the multiline comment body.
  5. If you want to implement this for the 'D' language, change the '*' to '+'.

COMMENT_NEST : '/*' ( ('/'|'*'+)? ~[*/] | COMMENT_NEST | COMMENT_INL )*? ('/'|'*'+?)? '*/' ;

COMMENT_INL : '//' ( COMMENT_INL | ~[\n\r] )* ;

Upvotes: 0

Douglas
Douglas

Reputation: 11

Works for Antlr3.

Allows nested comments and '*' within a comment.

fragment
F_MultiLineCommentTerm
:
(   {LA(1) == '*' && LA(2) != '/'}? => '*'
|   {LA(1) == '/' && LA(2) == '*'}? => F_MultiLineComment
|   ~('*') 
)*
;   

fragment
F_MultiLineComment
:
'/*' 
F_MultiLineCommentTerm
'*/'
;   

H_MultiLineComment
:   r=  F_MultiLineComment
    {   $channel=HIDDEN;
        printf(stder,"F_MultiLineComment[\%s]",$r->getText($r)->chars); 
    }
;

Upvotes: 1

KinGamer
KinGamer

Reputation: 509

I'm using:

COMMENT: '/*' ('/'*? COMMENT | ('/'* | '*'*) ~[/*])*? '*'*? '*/' -> skip;

This forces any /* inside a comment to be the beginning of a nested comment and similarly with */. In other words, there's no way to recognize /* and */ other than at the beginning and end of the rule COMMENT.

This way, something like /* /* /* */ a */ would not be recognized entirely as a (bad) comment (mismatched /*s and */s), as it would if using COMMENT: '/*' (COMMENT|.)*? '*/' -> skip;, but as /, followed by *, followed by correct nested comments /* /* */ a */.

Upvotes: 4

Mike Lischke
Mike Lischke

Reputation: 53572

I can give you an ANTLR3 solution, which you can adjust to work in ANTLR4:

I think you can use a recursive rule invocation. Make a non-greedy comment rule for /* ... */ which calls itself. That should allow for unlimited nesting without having to count opening + closing comment markers:

COMMENT option { greedy = false; }:
    ('/*' ({LA(1) == '/' && LA(2) == '*'} => COMMENT | .) .* '*/') -> channel(HIDDEN)
;

or maybe even:

COMMENT option { greedy = false; }:
    ('/*' .* COMMENT? .* '*/') -> channel(HIDDEN)
;

I'm not sure if ANTLR properly chooses the right path depending on any char or the comment introducer. Try it out.

Upvotes: 0

Related Questions