inbarajan
inbarajan

Reputation: 31

How do I get the original text that an antlr4 C++ rule matched?

Am trying to port the antlr java project to C++. In java I was able to get the original text by getting help from How do I get the original text that an antlr4 rule matched? It was awesome!!

My attempt in C++,

    CharStream *input = ctx->start->getInputStream();
    int a = ctx->start->getStartIndex();
    int b = ctx->start->getStopIndex();
    IntervalSet interval = IntervalSet(a,b);
    string text = input->getText(interval.getIntervals()[2]);

This "getIntervals()" is helping me just like it worked in Java.

Not working properly in C++:

    CharStream *input = ctx->start->getInputStream();
    int a = ctx->start->getStartIndex();
    int b = ctx->start->getStopIndex();
    IntervalSet interval = IntervalSet(a,b);
    string text = input->getText(interval.getIntervals()[2]);

When tried like this, Iam not getting spaces,

string text = ctx->getText();

getting text without spaces.

intmain(){cout<<"Hello, World!";strncpy(pStr,pStart,len);for(i=0;i<10;i++){j=i*i;i=j/5;}return0;}

Upvotes: 1

Views: 1153

Answers (3)

vhora
vhora

Reputation: 330

If anybody reaches here for python3 and antlr4 like I did, for getting the original text, here is the way that worked for me.

  • In the grammar (.g4), ensure that whitespaces and anythin else thats needed i.e. hints, comments etc are being written to the hidden channel

        WS  : (' '|'\r'|'\t'|'\n') -> channel(HIDDEN)
        ;
    
  • In the Listener, for any rule that the text is needed, do the following:

        def enterCreateTableStatement(self, ctx: HiveParser.CreateTableStatementContext):
          original_text = ctx.parser.getInputStream().getText(ctx.start, ctx.stop)
          print(original_text)
    

Upvotes: 1

inbarajan
inbarajan

Reputation: 31

Everything worked as expected :) Thanks to Mike: https://stackoverflow.com/users/1137174/mike-lischke

Changes:

Used misc/Interval.h, ctx->getStart()->getStartIndex(); ctx->getStop()->getStopIndex();

enterExpressionstatement:

    Interval intvl = Interval();
    intvl.a = ctx->getStart()->getStartIndex();
    intvl.b = ctx->getStop()->getStopIndex();
    string text2 = input->getText(intvl);
    cout <<"enterExpressionstatement "<<text2<<endl;
    .....
    .....
    cout <<"enterIterationstatement "<<text2<<endl;

Output:

$ ./parser cpp_forloop
enterExpressionstatement cout << "Hello, World!";
enterExpressionstatement strncpy(pStr, pStart, len);
enterIterationstatement for(i = 0; i < 10; i++ ) { 
 j = i*i; i = j/5; 
 }
enterExpressionstatement i = 0;
enterExpressionstatement j = i*i;
enterExpressionstatement i = j/5;
End of program cpp_forloop

Upvotes: 1

Mike Lischke
Mike Lischke

Reputation: 53522

The behavior of RuleContext::getText is the same in all targets: it retrieves the text of the given context by adding the text of each subcontext to a single string (recursively). If your grammar skips whitespaces or puts them on a different channel then these characters are not included in the result, because there is no (visible) match for them and hence they don't appear in the parse tree.

But the getText() function on the input stream (taking an interval) is what gives you back the full original text (including all line breaks, comments etc.). Your given code however is confusing:

  1. You included the same code twice. One time you write this works and the other time it does not.
  2. You are accessing an interval in your set which doesn't exist (there's only one interval at index 0). Why creating an interval set in the first place? Just pass in Interval(a, b) instead.

Upvotes: 1

Related Questions