user235273
user235273

Reputation:

How to parse repeated attributes with antlr?

I have the following grammar.

meta : '<' TAG attribute* '>';

attribute : NAME '=' VAL;

TAG : [A-Z0-9]+;

NAME : [A-Z_-]+;

VAL : '"'.*?'"';

I want to match the below string.

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

But I am getting the following error.

ParseError extraneous input 'CONTENT' expecting {'>', NAME}  clj-antlr.common/parse-error (common.clj:146)

I am able to parse with one attribute.

<META HTTP-EQUIV="Content-Type">

How to parse repeated attributes? Giving attribute* has no effect.

Update: It's actually caused by the lexer. If I combine TAG and NAME then it works.

meta : '<' NAME attribute* '>';
NAME : [A-Z0-9_-]+;

But I don't want to have NAME to contain numbers. Is there a way to make this work?

Upvotes: 0

Views: 343

Answers (1)

Raven
Raven

Reputation: 3516

You can use two independent lexer rules and then use a parser rule to combine them respectively

ID: [A-Za-z]+ ;
NUMBER: [0-9]+ ;

tag: ID+ tag? | NUMBER+ tag? ;
name: ID+ name?  | ('_' | '-')+ name? 

If you have problems with whitespace between the elements being ignored you can use a different channel for it an enable it only in the above parser rules... It might even work to define the above parser rules as lexer rules but I'm not sure of that...

Upvotes: 1

Related Questions