Reputation: 123
I'm trying to parse a javadoc-style syntax in the following format:
/**
* this is description text
* this is description text also
* @name ID
* @param one
*/
Here's my grammar:
query_comment : BEGIN_QDOC (description_text | NOMANSLAND)*
name_declaration
(param_declaration | INNER_WS | NOMANSLAND)*
END_QDOC ;
name_declaration : NAME_KEY INNER_WS ID;
param_declaration : PARAM_KEY INNER_WS ID;
description_text : ~('\n')+;
BEGIN_QDOC : '/**';
END_QDOC : ('*/' | NASTY_GARBAGE '*/');
/*
* Stupid keywords.
*/
NAME_KEY : '@name';
PARAM_KEY : '@param';
/*
* Defines what constitutes a valid identifier.
*/
ID : ('a'..'z' | 'A'..'Z' | '0'..'9' | '-' | '_' | '?')+ ;
/*
* White space and garbage definitions.
*/
NOMANSLAND : NASTY_GARBAGE '*';
fragment NASTY_GARBAGE : '\r'? '\n' (INNER_WS)?;
INNER_WS : (' ' |'\t')+;
What I don't understand is why the description text is not parsing properly. It appears to be breaking up the description text block into ID
and INNER_WS
tokens, which be doesn't make any sense to me since ~('\n')
ought to come first in priority and be applied first. Instead 'this'
'is'
'description'
'text'
matches ID
tokens, which means it can't contain punctuation.
Upvotes: 1
Views: 520
Reputation: 5962
This is a perfect example of an island grammar where you care about islands of javadoc and don't care about the sea of stuff around it. The solution is to use lexical modes, as described in the book. Essentially, you need a mode for normal Java parsing and then a mode for what's going on inside the comments. Your rules like NOMANSLAND would be the sea outside. when you see the start of a comment you enter the "inside mode". where you would need rules like INNER_WS.
Upvotes: 1