jlengrand
jlengrand

Reputation: 12827

What if a lexer is expected after my expression?

Still working on Antlr, I search for something that I can't find in the docs.

Here is a parser that can detect the 'not' pattern in a string :

 factor : 'not'^ primary| primary;
 (and some other lines).

But what if I want to detect an expression after my primary ? For example

B exists

How can I define a parser rule searching after the rest of my expression? I tried this by analogy, but couldn't get it to work until now.

 exists : primary 'exists'^ | primary;

Depending on where I place exists in my expression, I got

line 1:44 extraneous input 'exists' expecting ')'

or

line 1:3 mismatched input 'exists' expecting ')'
line 1:22 missing EOF at ')'

errors

Thank you !

EDIT:

I have the very same grammar as yours but for one thing. Here is my code :

// Aiming at parsing a complete BQS formed Query
grammar Logic;

options {
    backtrack=true;
    output=AST;
}

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/

 parse  
    : expression EOF -> expression
    ; // ommit the EOF token

 expression
    : query
    ;       

 query  
    : term (OR^ term)*    // make `or` the root
    ;

 term   
    : factor (AND^ factor)*
    ;

 factor 
    : NOT^ primary 
    | primary
    ;


 primary // this one has to be completed (a lot)
    : atom (LIKE^ atom)* // right expressions have to be indicated
    | atom (EXISTS^)?
    ;

 atom   
    : ID 
    | '('! expression ')'! // omit both ( and )
    ;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
// GENERAL OPERATORS: 
NOTLIKE :   'notlike'; // whitespaces have been removed
LIKE    :   'like';
EXISTS  :   'exists';

OR          :   'or';
AND         :   'and';
NOT         :   'not';

//ELEMENTS 
ID          :   (CHARACTER)+;   

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

So the problem obviously comes from here.

I though have a serious problem here. Once after having removed all or, and and not, I expect my primary to be either in the form :

A like B

or

A exists

What is so wrong about my code? I think this is exactly what my primary rule says ?

I would really love to find a way to debug myself, because this :

line 1:3 mismatched input 'like' expecting ')'  

is really not self explanatory

Thank you very much for the help, I really struggle to understand the antlr documentation website :s.

Upvotes: 1

Views: 176

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170298

I suspect you misplaced an EOF in one of your parser rules. There doesn't seem to be much wrong with the rules you posted: at least, not something that could cause the errors you posted.

For (possible) future questions (not only ANTLR ones), I highly recommend you post a grammar (or example code) that is "self-contained". In other words: you post a grammar that someone can easily, without modifying it (!), run on their own machine so that they see exactly what you see. Now it's just guessing how your other rules look like.

The following:

exists
 : atom (Exists^)?
 ;

works like a charm, as you can test yourself:

grammar T;

options {
  output=AST;
}

parse
 : expr EOF -> expr
 ;

expr
  :  orexp
  ;

orexp
  :  andexp (Or^ andexp)*
  ;

andexp
  :  not (And^ not)*
  ;

not
 : Not^ exists
 | exists
 ;

exists
 : atom (Exists^)?
 ;

atom
 : Num
 | Id
 | '(' expr ')' -> expr
 ;

Or     : 'or';
And    : 'and';
Exists : 'exists';
Not    : 'not';
Num    : '0'..'9'+;
Id     :  ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
Space  : ' ' {skip();};

The input:

(A or B) and (not C or D exists) or not E exists

produces the following AST:

enter image description here

EDIT

Both alternative 1 and alternative 2 of your primary rule:

 primary
  : atom (LIKE^ atom)* // alternative 1
  | atom (EXISTS^)?    // alternative 2
  ;

match a single atom. This is the cause of the parser not being able to parse your input properly (and the need for you to add backtrack=true;, which should be avoided!).

I didn't test it, but I'm pretty sure it works if you remove backtrack=true; from the options block, and rewrite primary as follows:

primary
 : atom ( (LIKE^ atom)* // alternative 1
        | EXISTS^       // alternative 2
        )
 ;

Now alternative 1 only matches a single atom, and there's no ambiguity (at least, not in that rule).

Upvotes: 3

Related Questions