Reputation: 24124
I have a simple grammar as follows:
grammar SampleConfig;
line: ID (WS)* '=' (WS)* string;
ID: [a-zA-Z]+;
string: '"' (ESC|.)*? '"' ;
ESC : '\\"' | '\\\\' ; // 2-char sequences \" and \\
WS: [ \t]+ -> skip;
The spaces in the input are completely ignored, including those in the string literal.
final String input = "key = \"value with spaces in between\"";
final SampleConfigLexer l = new SampleConfigLexer(new ANTLRInputStream(input));
final SampleConfigParser p = new SampleConfigParser(new CommonTokenStream(l));
final LineContext context = p.line();
System.out.println(context.getChildCount() + ": " + context.getText());
This prints the following output:
3: key="valuewithspacesinbetween"
But, I expected the white spaces in the string literal to be retained, i.e.
3: key="value with spaces in between"
Is it possible to correct the grammar to achieve this behavior or should I just override CommonTokenStream to ignore whitespace during the parsing process?
Upvotes: 6
Views: 2239
Reputation: 170138
You shouldn't expect any spaces in parser rules since you're skipping them in your lexer.
Either remove the skip command or make string
a lexer rule:
STRING : '"' ( '\\' [\\"] | ~[\\"\r\n] )* '"';
Upvotes: 6