Reputation: 25
I'm new to ANTLR so I hope you guy explains for me explicitly.
I have a /* comment */ (BC) lexer in ANTLR, I want it to be like this:
/* sample */ => BC
/* s
a
m
p
l
e */ => BC
"" => STRING
" " => STRING
"a" => STRING
"hello world \1" => STRING
but I got this:
/* sample */
/* s
a
m
p
l
e */ => BC
""
" "
"a"
"hello world \1" => STRING
it only take the 1st /* and the last */, same with my String token. Here's the code of Comments:
BC: '/*'.*'*/';
And the String:
STRING: '"'(~('"')|(' '|'\b'|'\f'|'r'|'\n'|'\t'|'\"'|'\\'))*'"';
Upvotes: 1
Views: 1597
Reputation: 4481
Also you can use the following code fragment without non-greedy syntax (more general soultion):
MultilineCommentStart: '/*' -> more, mode(COMMENTS);
mode COMMENTS;
MultilineComment: '*/' -> mode(DEFAULT_MODE);
MultilineCommentNotAsterisk: ~'*'+ -> more;
MultilineCommentAsterisk: '*' -> more;
Upvotes: 1
Reputation: 2698
Lexer rules are greedy by default, meaning they try to consume the longest matching sequence. So they stop at the last closing delimiter.
To make a rule non-greedy, use, well, nongreedy rules:
BC: '/*' .*? '*/';
This will stop at the first closing */
which is exactly what you need.
Same with your STRING. Read about it in The Definitive ANTLR4 Reference, page 285.
Upvotes: 3