Antlr 3.4.0 mismatched input for generated parser and not in interpreter

Question

I know that this has been discussed thousand times but I still cannot figure out why is following grammar failing. In interpreter everything works fine, without any errors or warnings. However when running the generated code, I'm getting mismatched input as shown below.

For this grammar:

grammar xxx;

options {
    language = Java;
    output = AST;
}

@members {
  @Override
    public String getErrorMessage(RecognitionException e,
    String[] tokenNames)
    {
        List stack = getRuleInvocationStack(e, this.getClass().getName());
        String msg = null;
        if ( e instanceof NoViableAltException ) {
            NoViableAltException nvae = (NoViableAltException)e;
            msg = " no viable alt; token="+e.token+
            " (decision="+nvae.decisionNumber+
            " state "+nvae.stateNumber+")"+
            " decision=<<"+nvae.grammarDecisionDescription+">>";
        }
        else {
          msg = super.getErrorMessage(e, tokenNames);
        }
        return stack+" "+msg;
    }

  @Override
    public String getTokenErrorDisplay(Token t) {
      return t.toString();
    }
}

obj
      : first=subscription 
      (COMMA other=subscription)*
      ;

subscription
      : ID
      (EQUALS arguments_in_brackets)?
      filters
      ;

arguments_in_brackets
      : LOPAREN arguments ROPAREN
      ;

arguments
      : (arguments_body)
      ;

arguments_body
      : argument (arguments_more)?
      ;

arguments_more
      : SEMICOLON arguments_body
      ;

argument
    : id_equals argument_body
    ;

argument_body
    :   STRING
    |   INT
    |   FLOAT
    ;

filters
      : LSPAREN expression RSPAREN
      ;

expression
      :  or
      ;

or
    : first=and
    (OR^ second=and)*
    ;

and        : first=atom
    (AND^ second=atom)*
    ;

atom
    : filter
    | atom_expression
    ;

atom_expression
    : LCPAREN
    expression
    RCPAREN
    ;

filter
    : id_equals arguments_in_brackets
    ;

id_equals
    : WS* ID WS* EQUALS WS*
    ;

COMMA: WS* ',' WS*;
LCPAREN : WS* '(' WS*;
RCPAREN : WS* ')' WS*;
LSPAREN : WS* '[' WS*;
RSPAREN : WS* ']' WS*;
LOPAREN : WS* '{' WS*;
ROPAREN : WS* '}' WS*;
AND: WS* 'AND' WS*;
OR: WS* 'OR' WS*;
NOT: WS* 'NOT' WS*;
EQUALS: WS* '=' WS*;
SEMICOLON: WS* ';' WS*;

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
    |   '.' ('0'..'9')+ EXPONENT?
    |   ('0'..'9')+ EXPONENT
    ;

//    :  '"' ( ESC_SEQ | ~('\'|'"') )* '"'
//    :   '"' (~'"')* '"'
STRING
    :   '"' (~'"')* '"'
    ;

fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\' ('b'|'t'|'n'|'f'|'r'|'"'|'\''|'\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\' ('0'..'7') ('0'..'7')
    |   '\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

NEWLINE:    '
'? '
' {skip();} ;

WS:     (' '|'	')+ {skip();} ;

And for this input:

status={name="Waiting";val=5}[ownerEmail1={email="dsa@fdsf.ds"} OR internalStatus={status="New"}],comments={type="fds"}[(internalStatus={status="Owned"} AND ownerEmail2={email="dsa@fds.ds"}) OR (role={type="Contributor"} AND status={status="Closed"})]

I'm getting:

line 1:67 [obj, subscription, filters, expression, or, and, atom, filter, arguments_in_brackets] mismatched input [@18,67:80='internalStatus',<11>,1:67] expecting  ROPAREN
line 1:157 [obj, subscription, filters, expression, or, and, atom, atom_expression, expression, or, and, atom, filter, arguments_in_brackets] mismatched input [@42,157:167='ownerEmail2',<11>,1:157] expecting ROPAREN

Can someone give me any clues why is this failing please? I've tried to rewrite it in many ways but the error is still the same.

Bart Kiers · Accepted Answer

The problem is that you're using WS tokens in other lexer rules and are therefor skipping these tokens. This causes the lexer to discard these tokens entirely, and can then not be used in parser rules.

So, if you have a rule like:

WS : ' ' {skip();};

and then use this rule in NOT:

NOT : WS* 'NOT' WS*;

it causes the NOT token to be skipped as well.

If you're already skipping these WS chars, you don't need to include them in your other lexer rules: simply remove all WS* in other rules:

...
NOT : 'NOT';
...

(also remove them from parser rules: all skipped tokens from the lexer are never available in parser rules anyway!)

Antlr 3.4.0 mismatched input for generated parser and not in interpreter

Answers (1)

Related Questions