user1956524
user1956524

Reputation: 11

ANTLR4 white space causing problems

I have been using CoCoR for quite a while and I thought I'd take look at ANTLR4. I'm using the C# version of ANTLR4. I put together the beginnings of a grammar and found that it didn't work. After a lot of experimenting I found that the problem came down to a problem with white spaces. Below is a small grammar to demonstrate the problem:

grammar AB;

/*
 * Parser Rules
 */
parse: ab;


ab: IDENT ( ',' IDENT )*;

/*
 * Lexer Rules
*/

IDENT: A_Z_ ( A_Z_ | DIGIT )*;

fragment A_Z_: [A-Z,a-z,_];
fragment DIGIT: [0-9];

WS: [ \t\r\n]+ -> skip;

When giving the grammar the inputs I get:

A,B Gives no syntax error.

A , B Gives no syntax error.

A, B Gives: Line: 1 extraneous input 'B' expecting {, ','}

A ,B Gives: Line: 1 extraneous input ',B' expecting {, ','}

I'm probably missing something in my understanding of the white space handling but I thought the WS rule was supposed to throw away all whites spaces so any input would have been equivalent to the A,B input that works. Also if I comment out the WS rule it makes no difference. It's as though the WS rule is doing nothing.

Upvotes: 1

Views: 230

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

The problem is that the comma is being used in your IDENT rule.

Don't do:

fragment A_Z_: [A-Z,a-z,_];

but do this instead:

fragment A_Z_: [A-Za-z_];

Upvotes: 1

Related Questions