jj1999
jj1999

Reputation: 11

Is it possible to count the number of times an element appears when parsing?

I am trying to find a way to keep track of the number of times an element appears when building a parse tree. Let's say my grammar looks something like this:

grammar sample;

@members {
     private int xCount= 0;
     private int yCount= 0;
     private int zCount= 0;
}
// rules
fileData        : part1 part2 EOF;
part1           : x+ y+ z+;
part2           : z+ y+ x+;

// I want to count these:
x               : a+ b+ c+ {xCount++;};
y               : c+ b+ a+ {yCount++;};
z               : a+ c+ b+ {zCount++;};

// lexer rules
a               : '1';
b               : '2';
c               : '3';

How can I make use of the variables counted and say something like {System.out.println("found 4 x's, 5 y's, and 6 z's");} when the parsing is complete?

Upvotes: 1

Views: 73

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170188

You can simply print it at the end of your start rule (fileData):

fileData
 : part1 part2 EOF
   {System.out.println("found...");}
 ;

or add a method to your @members section:

grammar sample;

@members {
     private int xCount = 0;
     private int yCount = 0;
     private int zCount = 0;

     public String getCounters() {
         return String.format("found %d x's, %d y's, and %d z's\n", xCount, yCount, zCount);
     }
}
// rules
fileData        : part1 part2 EOF;
part1           : x+ y+ z+;
part2           : z+ y+ x+;

// I want to count these:
x               : a+ b+ c+ {xCount++;};
y               : c+ b+ a+ {yCount++;};
z               : a+ c+ b+ {zCount++;};

// lexer rules
a               : '1';
b               : '2';
c               : '3';

and then use it like this:

String input = "123321132132321123";
sampleLexer lexer = new sampleLexer(CharStreams.fromString(input));
sampleParser parser = new sampleParser(new CommonTokenStream(lexer));

System.out.println(parser.fileData().toStringTree(parser));
System.out.println(parser.getCounters());

which will print:

(fileData (part1 (x (a 1) (b 2) (c 3)) (y (c 3) (b 2) (a 1)) (z (a 1) (c 3) (b 2))) (part2 (z (a 1) (c 3) (b 2)) (y (c 3) (b 2) (a 1)) (x (a 1) (b 2) (c 3))) <EOF>)
found 2 x's, 2 y's, and 2 z's

But an IMO better option is to separate the (Java) code from your grammar, and move this into a listener. So remove the embedded code from the grammar:

grammar sample;

// rules
fileData        : part1 part2 EOF;
part1           : x+ y+ z+;
part2           : z+ y+ x+;

// I want to count these:
x               : a+ b+ c+;
y               : c+ b+ a+;
z               : a+ c+ b+;

// lexer rules
a               : '1';
b               : '2';
c               : '3';

and create a listener:

class CounterListener extends sampleBaseListener {

  private int xCount = 0;
  private int yCount = 0;
  private int zCount = 0;

  @Override
  public void enterX(sampleParser.XContext ctx) {
    this.xCount++;
  }

  @Override
  public void enterY(sampleParser.YContext ctx) {
    this.yCount++;
  }

  @Override
  public void enterZ(sampleParser.ZContext ctx) {
    this.zCount++;
  }

  public String getCounters() {
    return String.format("In listener found %d x's, %d y's, and %d z's\n", this.xCount, this.yCount, this.zCount);
  }
}

and use it like this:

String input = "123321132132321123";
sampleLexer lexer = new sampleLexer(CharStreams.fromString(input));
sampleParser parser = new sampleParser(new CommonTokenStream(lexer));

CounterListener listener = new CounterListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.fileData());

System.out.println(listener.getCounters());

resulting in:

In listener found 2 x's, 2 y's, and 2 z's

See: https://github.com/antlr/antlr4/blob/master/doc/listeners.md

Note that the comment "// lexer rules" in:

// lexer rules
a               : '1';
b               : '2';
c               : '3';

is not really correct: they're parser rules, not lexer rules. The snippet above is interpreted by ANTLR like this:

a               : T_0;
b               : T_1;
c               : T_2;

T_0 : '1';
T_1 : '2';
T_2 : '3';

where T_0, T_1 and T_2 are the lexer rules.

Upvotes: 2

Related Questions