Reputation: 1218
I have a generic parser that I created that is fairly small in terms of lines of code but that I have been able to successfully use for my purposes. It can handle recursive grammars, performs well, supports regex, allow for both normal tokenization mode or context specific tokenization which in turn allows for otherwise conflicting tokens to work just fine in grammars and so on.
Due to the overall popularity of ANTLR I decided it may be worth learning more about it (maybe I've been reinventing the wheel) but before making the time investment I'd like to know if it could do some of the same things my parser currently provides to me. Unfortunately, I wasn't able to find a comprehensive enough list of its features, at least not one that answered the questions I pose below.
Does ANTRL provides the features below?
My parser was designed to help with code completion like what you would see in an IDE. When failing to parse an input it always gives the possible tokens it should have matched at the place the failure occurred. A similar feature is that for recursive rules, when there is a successfully parsed input I can obtain information about the possible rules I would have to satisfy if I were to have a longer input (or if I were to keep typing in terms of code completion).
From the little I know about ANTLR it seems it supports a visitor
pattern. My parser actually uses a visitor pattern also but it also
provides some context about the match such as a stack with match
depth information among other things. For example, if a language has
functions that allows nested functions my visitor method allows me to
only process the functions that are in the level I care about. Also I assume it provides start and end index of matches.
My parser supports regex and in conjunction with the context specific tokenization mode I can make some grammars dramatically smaller at the cost of some performance (not bad at all for DSL). An example of this would be that I can have a token that matches the word "is" and another that matches the pattern "\w+" and the word "is" would be translated to the appropriate token depending on the context even though both of these could match the word "is". Does ANTRL support regex or something similar to this context specific tokenizer?
My parser supports a searching mode which basically means that I don't need to parse the whole input but that I can run through it parsing the parts I'm interested on.
Upvotes: 0
Views: 211
Reputation: 8075
I think that a PEG-Parser would be more suitable for your requirements. Yet keep in mind that the strict separation of parsing and lexing is more performant.
If you do not already use DFA-regexes for lexing and if performance is an issue, than switching technology (either to ANTLR or to PEG-Parsers) could be a good next step.
Upvotes: 1