Keith Whittingham
Keith Whittingham

Reputation: 326

ANTLR parser rule to match: x and/or y and/or z in any order

Using ANTLR, is there a way to write a parser rule so that it can express: x and/or y and/or z in any order without writing Java. For example it should match: "x y", "y z", and "x y z" but not "x x y". The best I can think of is is the rule below but then I would need to check in the tree walker for "x x y".

rule: ( x | y | z )* ;

Upvotes: 3

Views: 597

Answers (2)

Keith Whittingham
Keith Whittingham

Reputation: 326

The best I can come up with is...

grammar Sandbox;

@members {
    boolean a, b, c;
}

start: ( 'test' test )+ EOF ;

test:
    {a=b=c=true;}   // Reset
    (   {a}? a {a=false;}
    |   {b}? b {b=false;}
    |   {c}? c {c=false;}
    )* ;

a: 'a';
b: 'b';
c: 'c';

WS : [ \t\r\n]+ -> skip ;

And the test driver...

package sandbox;

import org.antlr.v4.runtime.*;

public class Main {

    public static void main(String[] args) {
        new Main();
    }

    private Main() {
        System.out.println("Should be OK...");
        test("test a b c test c test c b a test c");
        System.out.println("Should fail...");
        test("test c a a");
    }

    private void test(String toTest) {
        final CharStream cs = CharStreams.fromString(toTest);
        final SandboxLexer lexer = new SandboxLexer(cs);
        final CommonTokenStream tokens = new CommonTokenStream(lexer);
        final SandboxParser parser = new SandboxParser(tokens);
        parser.start();
    }
}

Upvotes: 1

Mike Cargal
Mike Cargal

Reputation: 6785

While you can do something akin to:

rule: x
    | y
    | z
    | x y
    | y x
    | x z
    | z x
    | y z
    | z y
    | x y z
    | x z y
    | y x z
    | y z x
    | z x y
    | z y x;

or (a little less absurd):

rule: x? y? z?
    | x? z? y?
    | y? x? z?
    | y? z? x?
    | z? x? y?
    | z? y? x?
    ;

I'd suspect that you're example is much simpler than your real application, and this approach would become way too tedious (It's already getting absurd).

You could also look into something using semantic predicates, but that will lock your grammar to a particular target language. (It'll also complicate your grammar.)

In general, I find that ANTLR users (parser writers in general), often try too hard to encode "all the rules" into the grammar.

This seems like it would be nice, but it can lead to a lot of complexity in the grammar, and to "less than optimal" error messages (since they're coming from the parser (ANTLR) itself).

I think you'll find that it's better to keep a rule like you have which will create a ParseTree that accurately represents the right way to interpret (a.k.a. "parse") the input. Then you consider a rule like this to be a Semantic concern (as opposed to a Syntactic concern (the domain of the parser).

That means that you would write something like a validation Listener that would run against your parse tree, and you can check for use of the same child rule more than once. If you encounter it, you can then craft a very specific error message that is more useful to the end user.

Upvotes: 3

Related Questions