Reputation: 343
I am facing a performance issues with the ANTLR4-Java TokenStreamRewriter getText() method.. This method is taking too high time to get the result. Please check my use case and my observations below, and let me know why this is taking too long time and whether any optimization can be done for this to reduce the time taken?
Use Case: I have a Sql text file of size 17 MB, after parsing the file, we are modifying many things in the token stream using TokenStreamRewriter API.. After all the modifications we are using the getText() method of TokenStreamRewriter to get the modified text. The getText() method is taking more than 5 minutes to fetch the modified result.
I have tested for a file size of 54MB, this time the getText() method took around 110 minutes to fetch the modified result.
During My analysis i have found the method which is taking more time is: TokenStreamRewriter.reduceToSingleOperationPerIndex()
I have captured some execution details of this method with timings, refer to below information
For 17MB File:
2020-05-14 17:22:33,187 INFO CustomTokenStreamRewriter:392 No of rewrites=164570
2020-05-14 17:22:33,187 INFO CustomTokenStreamRewriter:395 Start Walk Replaces
2020-05-14 17:25:35,874 INFO CustomTokenStreamRewriter:438 End Walk Replaces
2020-05-14 17:25:35,956 INFO CustomTokenStreamRewriter:441 Start Walk Inserts
2020-05-14 17:27:49,723 INFO CustomTokenStreamRewriter:476 End Walk Inserts
2020-05-14 17:27:49,732 INFO CustomTokenStreamRewriter:477 rewrites size after=164570
2020-05-14 17:27:49,732 INFO CustomTokenStreamRewriter:478 Start Walk after
2020-05-14 17:27:49,830 INFO CustomTokenStreamRewriter:488 End Walk after
For 54MB File:
2020-05-14 17:33:29,650 INFO CustomTokenStreamRewriter:392 No of rewrites=493728
2020-05-14 17:33:29,651 INFO CustomTokenStreamRewriter:395 Start Walk Replaces
2020-05-14 18:36:26,294 INFO CustomTokenStreamRewriter:438 End Walk Replaces
2020-05-14 18:36:26,362 INFO CustomTokenStreamRewriter:441 Start Walk Inserts
2020-05-14 19:21:56,112 INFO CustomTokenStreamRewriter:476 End Walk Inserts
2020-05-14 19:21:56,131 INFO CustomTokenStreamRewriter:477 rewrites size after=493728
2020-05-14 19:21:56,132 INFO CustomTokenStreamRewriter:478 Start Walk after
2020-05-14 19:21:56,224 INFO CustomTokenStreamRewriter:488 End Walk after
Env Details: High End Linux machine, memory(xmx) value set as: 30 GB
Upvotes: 0
Views: 113
Reputation: 12495
TokenStreamRewriter
is lazy. So, contrary to what the question suggests, you are not calling getText
after performing the modifications to just fetch the modified text. Modifications are only queued up by the API. It's when you call getText
and then the actual work happens.
Since the work is descirbed as "modifying many things" in a 17 megabytes of tokenized input, 5 minutes does not sound too far-fetched.
I guess the answer is: you are not experiencing any problem with the getText
, you just observe the total time of your queued transformations. Whether they can be optimized or not, looks like a matter for another question.
Upvotes: 2