Reputation: 31
Am trying to port the antlr java project to C++. In java I was able to get the original text by getting help from How do I get the original text that an antlr4 rule matched? It was awesome!!
My attempt in C++,
CharStream *input = ctx->start->getInputStream();
int a = ctx->start->getStartIndex();
int b = ctx->start->getStopIndex();
IntervalSet interval = IntervalSet(a,b);
string text = input->getText(interval.getIntervals()[2]);
This "getIntervals()" is helping me just like it worked in Java.
Not working properly in C++:
CharStream *input = ctx->start->getInputStream();
int a = ctx->start->getStartIndex();
int b = ctx->start->getStopIndex();
IntervalSet interval = IntervalSet(a,b);
string text = input->getText(interval.getIntervals()[2]);
When tried like this, Iam not getting spaces,
string text = ctx->getText();
getting text without spaces.
intmain(){cout<<"Hello, World!";strncpy(pStr,pStart,len);for(i=0;i<10;i++){j=i*i;i=j/5;}return0;}
Upvotes: 1
Views: 1153
Reputation: 330
If anybody reaches here for python3 and antlr4 like I did, for getting the original text, here is the way that worked for me.
In the grammar (.g4), ensure that whitespaces and anythin else thats needed i.e. hints, comments etc are being written to the hidden channel
WS : (' '|'\r'|'\t'|'\n') -> channel(HIDDEN)
;
In the Listener, for any rule that the text is needed, do the following:
def enterCreateTableStatement(self, ctx: HiveParser.CreateTableStatementContext):
original_text = ctx.parser.getInputStream().getText(ctx.start, ctx.stop)
print(original_text)
Upvotes: 1
Reputation: 31
Everything worked as expected :) Thanks to Mike: https://stackoverflow.com/users/1137174/mike-lischke
Changes:
Used misc/Interval.h, ctx->getStart()->getStartIndex(); ctx->getStop()->getStopIndex();
enterExpressionstatement:
Interval intvl = Interval();
intvl.a = ctx->getStart()->getStartIndex();
intvl.b = ctx->getStop()->getStopIndex();
string text2 = input->getText(intvl);
cout <<"enterExpressionstatement "<<text2<<endl;
.....
.....
cout <<"enterIterationstatement "<<text2<<endl;
Output:
$ ./parser cpp_forloop
enterExpressionstatement cout << "Hello, World!";
enterExpressionstatement strncpy(pStr, pStart, len);
enterIterationstatement for(i = 0; i < 10; i++ ) {
j = i*i; i = j/5;
}
enterExpressionstatement i = 0;
enterExpressionstatement j = i*i;
enterExpressionstatement i = j/5;
End of program cpp_forloop
Upvotes: 1
Reputation: 53522
The behavior of RuleContext::getText
is the same in all targets: it retrieves the text of the given context by adding the text of each subcontext to a single string (recursively). If your grammar skips whitespaces or puts them on a different channel then these characters are not included in the result, because there is no (visible) match for them and hence they don't appear in the parse tree.
But the getText()
function on the input stream (taking an interval) is what gives you back the full original text (including all line breaks, comments etc.). Your given code however is confusing:
Interval(a, b)
instead.Upvotes: 1