Reputation: 11
I have been using CoCoR for quite a while and I thought I'd take look at ANTLR4. I'm using the C# version of ANTLR4. I put together the beginnings of a grammar and found that it didn't work. After a lot of experimenting I found that the problem came down to a problem with white spaces. Below is a small grammar to demonstrate the problem:
grammar AB;
/*
* Parser Rules
*/
parse: ab;
ab: IDENT ( ',' IDENT )*;
/*
* Lexer Rules
*/
IDENT: A_Z_ ( A_Z_ | DIGIT )*;
fragment A_Z_: [A-Z,a-z,_];
fragment DIGIT: [0-9];
WS: [ \t\r\n]+ -> skip;
When giving the grammar the inputs I get:
A,B Gives no syntax error.
A , B Gives no syntax error.
A, B Gives: Line: 1 extraneous input 'B' expecting {, ','}
A ,B Gives: Line: 1 extraneous input ',B' expecting {, ','}
I'm probably missing something in my understanding of the white space handling but I thought the WS rule was supposed to throw away all whites spaces so any input would have been equivalent to the A,B input that works. Also if I comment out the WS rule it makes no difference. It's as though the WS rule is doing nothing.
Upvotes: 1
Views: 230
Reputation: 170158
The problem is that the comma is being used in your IDENT rule.
Don't do:
fragment A_Z_: [A-Z,a-z,_];
but do this instead:
fragment A_Z_: [A-Za-z_];
Upvotes: 1