Reputation: 138
I'm trying to make a scanner for Javascript with JavaCC. I have several problems, one of which is C-style comments: /* … */
I need to return the comments as tokens.
Here is one attempt:
TOKEN: {<MLCOMMENT: "/*" ( ~["*"] | ("*"(~["/"])?) )* "*/">}
TOKEN: {<MLCOMMENT_UNDELIM: ("/*"|"/*/") ( ~["/"] | (~["*"]"/") )* >}
MLCOMMENT was intended to match closed comments, and MLCOMMENT_UNDELIM open-ended comments. This doesn't work becuase /*a*/b*/
is a longer match to MLCOMMENT than /*a*/
.
Here is another attempt at solving this problem:
MORE:
{
"/*" : WithinMLComment
}
< WithinMLComment > TOKEN :
{
< MLCOMMENT: "*/" > : DEFAULT
}
< WithinMLComment > MORE :
{
< ~[] >
}
This doesn't work either since an open-ended comment would cause EOF in the WithinMLComment state. That's illegal (TokenMgrError is thrown).
Update: I may have found the solution:
TOKEN: {<MLCOMMENT: ("/*"|"/*/") ( ~["/"] | (~["*"]"/") )* "*/">}
TOKEN: {<MLCOMMENT_UNDELIM: ("/*"|"/*/") ( ~["/"] | (~["*"]"/") )* >}
Update 2:
It wasn't the solution. /**//
will be matched by MLCOMMENT_UNDELIM.
Upvotes: 2
Views: 2273
Reputation: 16221
For a multiline comment you can use
"/*" (~["*"])* "*" (~["*","/"] (~["*"])* "*" | "*")* "/"
For a multiline comment that is missing the final "*/", you can use
"/*" ( ~["*"] | ("*")+ ~["*","/"] )* ("*")*
Upvotes: 5