Reputation:
I have the following grammar.
meta : '<' TAG attribute* '>';
attribute : NAME '=' VAL;
TAG : [A-Z0-9]+;
NAME : [A-Z_-]+;
VAL : '"'.*?'"';
I want to match the below string.
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
But I am getting the following error.
ParseError extraneous input 'CONTENT' expecting {'>', NAME} clj-antlr.common/parse-error (common.clj:146)
I am able to parse with one attribute.
<META HTTP-EQUIV="Content-Type">
How to parse repeated attributes? Giving attribute*
has no effect.
Update: It's actually caused by the lexer. If I combine TAG
and NAME
then it works.
meta : '<' NAME attribute* '>';
NAME : [A-Z0-9_-]+;
But I don't want to have NAME
to contain numbers. Is there a way to make this work?
Upvotes: 0
Views: 343
Reputation: 3516
You can use two independent lexer rules and then use a parser rule to combine them respectively
ID: [A-Za-z]+ ;
NUMBER: [0-9]+ ;
tag: ID+ tag? | NUMBER+ tag? ;
name: ID+ name? | ('_' | '-')+ name?
If you have problems with whitespace between the elements being ignored you can use a different channel for it an enable it only in the above parser rules... It might even work to define the above parser rules as lexer rules but I'm not sure of that...
Upvotes: 1