Reputation: 2916
I am trying to write an ANTLR4 grammar to parse actionscript3. I've decided to start with something fairly coarse grained:
grammar actionscriptGrammar;
OBRACE:'{';
CBRACE:'}';
STRING_DELIM:'"';
BLOCK_COMMENT : '/*' .*? '*/' -> skip;
EOL_COMMENT : '//' .*? '/n' -> skip;
WS: [ \n\t\r]+ -> skip;
TEXT: ~[{} \n\t\r"]+;
thing
: TEXT
| string_literal
| OBRACE thing+? CBRACE;
string_literal : STRING_DELIM .+? STRING_DELIM;
start_rule
: thing+?;
Basically, I want a tree of things grouped by their lexical scope. I want comments to be ignored, and string literals be their own things so that any braces they may include do not affect lexical scope. The string_literal rule works fine (such as it is) but the two comment rules don't appear to have any effect. (i.e. comments aren't being ignored).
What am I missing?
Upvotes: 4
Views: 10379
Reputation: 5991
The answer is that your TEXT rule is consuming your comments. Rather than using a negated set, use something like:
TEXT: [a-zA-Z0-9_][/a-zA-Z0-9.;()\[\]_-]+ ;
That way, your comments cannot be matched by TEXT.
Upvotes: 4
Reputation: 231
This is from a simplified Java grammar I wrote in ANTLR v4.
WS
: [ \t\r\n]+ -> channel(HIDDEN)
;
COMMENT
: '/*' .*? '*/' -> skip
;
LINE_COMMENT
: '//' ~[\r\n]* -> skip
;
May be this could help you out.
Also, try rearranging your code. Write the Parser Rules first and Lexer Rules last. Follow a Top-Down approach. I find it much more helpful in debugging. It will also look nice when you create an HTML export of your grammar from ANTLR 4 Eclipse Plugin.
Good Luck!
Upvotes: 8