mpobrien
mpobrien

Reputation: 4962

How can I access blocks of text as an attribute that are matched using a greedy=false option in ANTLR?

I have a rule in my ANTLR grammar like this:

COMMENT :  '/*' (options {greedy=false;} : . )* '*/' ;

This rule simply matches c-style comments, so it will accept any pair of /* and */ with any arbitrary text lying in between, and it works fine.

What I want to do now is capture all the text between the /* and the */ when the rule matches, to make it accessible to an action. Something like this:

COMMENT :  '/*' e=((options {greedy=false;} : . )*) '*/' {System.out.println("got: " + $e.text);

This approach doesn't work, during parsing it gives "no viable alternative" upon reaching the first character after the "/*"

I'm not really clear on if/how this can be done - any suggestions or guidance welcome, thanks.

Upvotes: 0

Views: 301

Answers (2)

Bart Kiers
Bart Kiers

Reputation: 170227

Note that you can simply do:

getText().substring(2, getText().length()-2)

on the COMMENT token since the first and the last 2 characters will always be /* and */.

You could also remove the options {greedy=false;} : since both .* and .+ are ungreedy (although without the . they are greedy) (i).

EDIT

Or use setText(...) on the Comment token to discard the /* and */ immediately. A little demo:

file T.g:

grammar T;

@parser::members {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream(
                "/* abc */   \n" +
                "            \n" + 
                "/*          \n" +
                "   DEF      \n" + 
                "*/            "
        );
        TLexer lexer = new TLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TParser parser = new TParser(tokens);
        parser.parse();
    }
}

parse
  :  ( Comment {System.out.printf("parsed :: >\%s<\%n", $Comment.getText());} )+ EOF
  ;

Comment
  :  '/*' .* '*/' {setText(getText().substring(2, getText().length()-2));}
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

Then generate a parser & lexer, compile all .java files and run the parser containing the main method:

java -cp antlr-3.2.jar org.antlr.Tool T.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar TParser 
  (or `java -cp .;antlr-3.2.jar TParser` on Windows)

which will produce the following output:

parsed :: > abc <
parsed :: >          
   DEF      
<

(i) The Definitive ANTLR Reference, Chapter 4, Extended BNF Subrules, page 86.

Upvotes: 4

helloworld922
helloworld922

Reputation: 10939

Try this:

COMMENT :
  '/*' {StringBuilder comment = new StringBuilder();} ( options {greedy=false;} : c=. {comment.appendCodePoint(c);} )* '*/' {System.out.println(comment.toString());};

Another way which will actually return the StringBuilder object so you can use it in your program:

COMMENT returns [StringBuilder comment]:
  '/*' {comment = new StringBuilder();} ( options {greedy=false;} : c=. {comment.append((char)c);} )* '*/';

Upvotes: 1

Related Questions