Henrique Ferrolho
Henrique Ferrolho

Reputation: 942

ANTLR v4 different tokens for the same symbols

How can I recognize different tokens for the same symbol in ANTLR v4? For example, in selected = $("library[title='compiler'] isbn"); the first = is an assignment, whereas the second = is an operator.

Here are the relevant lexer rules:

EQUALS
:
    '='
;

OP
:
    '|='
    | '*='
    | '~='
    | '$='
    | '='
    | '!='
    | '^='
;

And here is the parser rule for that line:

assign
:
    ID EQUALS DOLLAR OPEN_PARENTHESIS QUOTES ID selector ID QUOTES
    CLOSE_PARENTHESIS SEMICOLON
;

selector
:
    OPEN_BRACKET ID OP APOSTROPHE ID APOSTROPHE CLOSE_BRACKET
;

This correctly parses the line, as long as I use an OP different than =.

Here is the error log:

JjQueryParser::init:34:29: mismatched input '=' expecting OP
JjQueryParser::init:34:39: mismatched input ''' expecting '\"'
JjQueryParser::init:34:46: mismatched input '"' expecting '='

Upvotes: 0

Views: 1548

Answers (2)

Sym-Sym
Sym-Sym

Reputation: 3606

I had the same issue. Resolved in the lexer as follows:

EQUALS: '=';
OP    : '|' EQUALS
      | '*' EQUALS
      | '~' EQUALS
      | '$' EQUALS
      | '!' EQUALS
      | '^' EQUALS
      ;

This guarantees that the symbol '=' is represented by a single token all the way. Don't forget to update the relevant rule as follows:

selector
:
OPEN_BRACKET ID (OP|EQUALS) APOSTROPHE ID APOSTROPHE CLOSE_BRACKET
;

Upvotes: 1

CoronA
CoronA

Reputation: 8095

The problem cannot be solved in the lexer, since the lexer does always return one token type for the same string. But it would be quite easy to resolve it in the parser. Just rewrite the rules lower case:

equals
: '='
;
op
:'|='
| '*='
| '~='
| '$='
| '='
| '!='
| '^='
;

Upvotes: 1

Related Questions