antlr grammar for tree construction from simple logic string

Question

I want to create a parser using antlr for the following string:

"1 AND (2 OR (3 AND 4)) AND 5" -> so i want to have AND and OR operations which should result in a tree after parsing was successful. this should result in the following tree:

i also want to avoid unclear inputs like "1 AND 2 OR 3" as it is not clear how to construct the tree from that. And it also seems like the parser "accepts" input with trailing sings such as "1 AND 2asdf".

what i have so far is (not working as expected):

    grammar code;

options {
  language=CSharp3;
  output=AST;
  ASTLabelType=CommonTree;
  //backtrack=true;
}

tokens {
  ROOT;
}


@rulecatch {
    catch {
        throw;
    }
}


@parser::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
@lexer::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }

@lexer::members {
  public override void  ReportError(RecognitionException e) {
        throw e;
  }
}


public parse : exp EOF -> ^(ROOT exp);


exp
  : atom
    ( And^ atom (And! atom)*
    | Or^ atom (Or! atom)*
    )?
  ;

atom
  :  Number
  |  '(' exp ')' -> exp
  ;

Number
  :  ('0'..'9')+
  ;

And
  :  'AND' | 'and'
  ;

Or
  :  'OR' | 'or'
  ;

WS       :           (' '|'	'|'\f'|'
'|'
')+{ Skip(); };

Hope someone of you guys can help me get on the right track!

edit: and how can i archieve "1 AND 2 AND 3" to result in

instead of

EDIT:

thanks for the great solution, it works like a charm except for one thing: when i call the parse() method on the following term "1 AND (2 OR (1 AND 3) AND 4" (closing bracket missing) the parser still accepts the input as valid.

this is my code so far: grammar code;

options {
  language=CSharp3;
  output=AST;
  ASTLabelType=CommonTree;
}

tokens {
  ROOT;
}


@rulecatch {
    catch {
        throw;
    }
}


@lexer::members {
  public override void  ReportError(RecognitionException e) {
        throw e;
  }
}

public parse
  :  exp -> ^(ROOT exp)
  ;


exp
  : atom
    ( And^ atom (And! atom)*
    | Or^ atom (Or! atom)*
    )?
  ;

atom
  :  Number
  |  '(' exp ')' -> exp
  ;

Number
  :  ('0'..'9')+
  ;

And
  :  'AND' | 'and'
  ;

Or
  :  'OR' | 'or'
  ;

WS       :           (' '|'	'|'\f'|'
'|'
')+{ Skip(); };

edit2: i just found another problem with my grammar: when i have input like "1 AND 2 OR 3" the grammar gets parsed just fine, but it should fail because either the "1 AND 2" needs to be inside brackets or the "2 OR 3" part. i dont understand why the parser runs through as in my opinion this grammar should really cover that case. is there any sort of online-testing-environment or so to find the problem? (i tried antlrWorks but the errors given there dont lead me anywhere...)

edit3: updated the code to represent the new grammar like suggested.

i still have the same problem that the following grammar:

public parse : exp EOF -> ^(ROOT exp);

doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?

edit4 i still have the same problem that the following grammar:

public parse : exp EOF -> ^(ROOT exp);

doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?

the problem seems to be in this part of the code:

EOF2=(IToken)Match(input,EOF,Follow._EOF_in_parse97);  
            stream_EOF.Add(EOF2);

When i add the following code (just a hack) it works...

        if (EOF2.Text == "") {
            throw new Exception(EOF2.Text);
        }

can i change anything so the parser gets generated correclty from the start?

Sam Harwell · Accepted Answer

This rule will disallow expressions containing both AND and OR without parentheses. It will also construct the parse tree you described by making the first AND or OR token the root of the AST, and then hiding the rest of the AND or OR tokens from the same expression.

exp
  : atom
    ( 'AND'^ atom ('AND'! atom)*
    | 'OR'^ atom ('OR'! atom)*
    )?
  ;

Edit: The second problem is unrelated to this. If you don't instruct ANTLR to consume all input by including an explicit EOF symbol in one of your parser rules, then it is allowed to consume only a portion of the input in an attempt to successfully match something.

The original parse rule says "match some input as exp". The following modification to the parse rule says "match the entire input as exp".

public parse : exp EOF -> ^(ROOT exp);

antlr grammar for tree construction from simple logic string

Answers (1)

Related Questions