Overdose
Overdose

Reputation: 1500

Special character handling in ANTLR lexer

I wrote the following grammar for string variable declaration. Strings are defined like anything between single quotes, but there must be a way to add a single quote to the string value by escaping using $ letter.

grammar test;

options       
{   
    language = Java;
}


tokens
{   
    VAR = 'VAR';
    END_VAR = 'END_VAR';
}


var_declaration: VAR string_type_declaration END_VAR EOF;

string_type_declaration: identifier ':=' string;

identifier: ID;

string: STRING_VALUE;

STRING_VALUE: '\'' ('$\''|.)* '\'';

ID:  LETTER+;

WSFULL:(' ') {$channel=HIDDEN;};

fragment LETTER: (('a'..'z') | ('A'..'Z'));

This grammar doesn't work, if you try to run this code for var_declaration rule:

VAR A :='$12.2' END_VAR

I get MismatchedTokenException.

But this code works fine for string_type_declaration rule:

A :='$12.2' 

Upvotes: 3

Views: 6164

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170178

Your STRING_VALUE isn't properly tokenized. Inside the loop ( ... )*, the $ expects a single quote after it, but the string in your input, '$12.2', doesn't have a quote after $. You should make the single quote optional ('$' '\''? | .)*. But now your alternative in the loop, the ., will also match a single quote: better let it match anything other than a single quote and $:

STRING_VALUE
 : '\'' ( '$' '\''? | ~('$' | '\'') )* '\''
 ;

resulting in the following parse tree:

enter image description here

Upvotes: 5

Related Questions