mmccoo
mmccoo

Reputation: 8536

How do I rewrite a subtree with a composite root using ANTLR?

I have an antlr grammer with subtrees like this:

 ^(type ID)

that I want to convert to:

 ^(type DUMMY ID)

where type is 'a'|'b'.

Note: what I really want to do is convert anonymous instantiations to explicit by generating dummy names.

I've narrowed it down to the grammars below, but I'm getting this:

(a bar) (b bar)
got td
got bu
Exception in thread "main" org.antlr.runtime.tree.RewriteEmptyStreamException: rule type
        at org.antlr.runtime.tree.RewriteRuleElementStream._next(RewriteRuleElementStream.java:157)
        at org.antlr.runtime.tree.RewriteRuleSubtreeStream.nextNode(RewriteRuleSubtreeStream.java:77)
        at Pattern.bu(Pattern.java:382)

The error message continues. My debug so far:

  1. The input made it through the initial grammar generating two trees. a bar and b bar.
  2. The second grammar does match the trees. it's printing td and bu.
  3. The rewrite crashes, but I have no idea why? What does RewriteEmptyStreamException mean.

What the proper way to do this kind of a rewrite?

My main grammer Rewrite.g:

grammar Rewrite;

options {
  output=AST;
}

@members{
  public static void main(String[] args) throws Exception {
      RewriteLexer lexer = new RewriteLexer(new ANTLRStringStream("a foo\nb bar"));
      RewriteParser parser = new RewriteParser(new CommonTokenStream(lexer));
      CommonTree tree = (CommonTree)parser.test().getTree();
      System.out.println(tree.toStringTree());

      CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);        
      Pattern p = new Pattern(nodes);

      CommonTree newtree = (CommonTree) p.downup(tree);
  }
}

type 
    : 'a'
    | 'b'
    ;

test : id+;
id   : type ID  -> ^(type ID["bar"]);

DUMMY : 'dummy';
ID   : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};

and Pattern.g

tree grammar Pattern;

options {
    tokenVocab = Rewrite;
    ASTLabelType=CommonTree;
    output=AST;
    filter=true;             // tree pattern matching mode
}

topdown
    : td
    ;

bottomup
    : bu
    ;

type 
    : 'a'
    | 'b'
    ;

td
    : ^(type ID) { System.out.println("got td"); }
    ;

bu
    : ^(type ID) { System.out.println("got bu"); }
        -> ^(type DUMMY ID)
    ;

to do compile:

java -cp ../jar/antlr-3.4-complete-no-antlrv2.jar org.antlr.Tool Rewrite.g 
java -cp ../jar/antlr-3.4-complete-no-antlrv2.jar org.antlr.Tool Pattern.g 
javac -cp ../jar/antlr-3.4-complete-no-antlrv2.jar *.java 
java -classpath .:../jar/antlr-3.4-complete-no-antlrv2.jar RewriteParser

EDIT 1: I have also tried using antlr4 and I get the same crash.

Upvotes: 2

Views: 1072

Answers (2)

user1201210
user1201210

Reputation: 3809

There are two small problems to address to get the rewrite to work, one problem in Rewrite and the other in Pattern.

The Rewrite grammar produces ^(type ID) as root elements in the output AST, as shown in the output (a bar) (b bar). A root element can't be transformed because transforming is actually a form of child-swapping: the element's parent drops the element and replaces it with the new, "transformed" version. Without a parent, you'll get the error Can't set single child to a list. Adding the root is a matter of creating an imaginary token ROOT or whatever name you like and referencing it in your entry-level rule's AST generation like so: test : id+ -> ^(ROOT id+);.

The Pattern grammar, the one producing the error you're getting, is confused by the type rule: type : 'a' | 'b' ; as part of the rewrite. I don't know the low-level details here, but apparently a tree parser doesn't maintain the state of a visited root rule like type in ^(type ID) when writing a transform (or maybe it can't or shouldn't, or maybe it's some other limitation). The easiest way to address this is with the following two changes:

  • Let text "a" and "b" match rule ID in the lexer by changing rule type in Rewrite from type: 'a' | 'b'; to just type: ID;.
  • Let rule bu in Pattern match against ^(ID ID) and transform to ^(ID DUMMY ID).

Now with a couple of minor debugging changes to Rewrite's main, input "a foo\nb bar" produces the following output:

(ROOT (a foo) (b bar))
got td
got bu
(a foo) -> (a DUMMY foo)
got td
got bu
(b bar) -> (b DUMMY bar)
(ROOT (a DUMMY foo) (b DUMMY bar))

Here are the files as I've changed them:

Rewrite.g

grammar Rewrite;

options {
  output=AST;
}

tokens { 
 ROOT;
}

@members{
  public static void main(String[] args) throws Exception {
      RewriteLexer lexer = new RewriteLexer(new ANTLRStringStream("a foo\nb bar"));
      RewriteParser parser = new RewriteParser(new CommonTokenStream(lexer));
      CommonTree tree = (CommonTree)parser.test().getTree();
      System.out.println(tree.toStringTree());

      CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);        
      Pattern p = new Pattern(nodes);

      CommonTree newtree = (CommonTree) p.downup(tree, true); //print the transitions to help debugging
      System.out.println(newtree.toStringTree()); //print the final result
  }
}

type : ID;
test : id+ -> ^(ROOT id+);
id   : type ID  -> ^(type ID);

DUMMY : 'dummy';
ID   : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};

Pattern.g

tree grammar Pattern;

options {
    tokenVocab = Rewrite;
    ASTLabelType=CommonTree;
    output=AST;
    filter=true;             // tree pattern matching mode
}

topdown
    : td
    ;

bottomup
    : bu
    ;

td
    : ^(ID ID) { System.out.println("got td"); }
    ;

bu
    : ^(ID ID) { System.out.println("got bu"); }
        -> ^(ID DUMMY ID)
    ;

Upvotes: 2

Bart Kiers
Bart Kiers

Reputation: 170188

I don't have much experience with tree-patterns, with or without rewrites. But when using rewrite rules in them, I believe your options should also include rewrite=true;. The Definitive ANTLR Reference doesn't handle them, so I'm not entirely sure (have a look at the ANTLR wiki for more info).

However, for such (relatively) simple rewrites, you don't really need a separate grammar. You could make DUMMY an imaginary token and inject it in some other parser rule, like this:

grammar T;

options {
  output=AST;
}

tokens {
  DUMMY;
}

test : id+;
id   : type ID  -> ^(type DUMMY["dummy"] ID);

type 
 : 'a'
 | 'b'
 ;

ID : ('a'..'z')+;
WS : (' '|'\n'|'r')+ {$channel=HIDDEN;};

which would parse the input:

a bar 
b foo

into the following AST:

enter image description here

Note that if your lexer is also meant to tokenize the input "dummy" as a DUMMY token, change the tokens { ... } block into this:

tokens {
  DUMMY='dummy';
}

and you'd still be able to inject a DUMMY in other rules.

Upvotes: 1

Related Questions