Reputation: 2765
According to Vala documentation: "Before 0.3.1, Vala's parser was the classic flex scanner and Bison LALR parser combination. But as of commit eba85a, the parser is a hand-crafted recursive descent parser." My question is: Why?
The question could be addressed to any compiler which isn't using parser generator. What are pros and cons for such a move from parser generator to hand-crafted parser? What are disadvantages of using parser generators (Bison, ANTLR) for compilers?
As a side comment: I'm interested in Vala specifically because I like the idea of having language with modern features and clean syntax but compilable into "native" and "unmanaged" high-level language (C in case of Vala). I have found only Vala so far. I'm thinking of having fun by making Vala (or similar language) compilable to C++ (backed by Qt libs). But since I don't want to invent completely new language I'm thinking of taking some existing grammar. Obviously hand-crafted parsers don't have written formal grammar I might reuse. Your comments on this idea are welcome (is the whole idea silly?).
Upvotes: 30
Views: 7884
Reputation: 311050
It is curious that these authors went from bison to RD. Most people would go in the opposite direction.
The only real reason I can see to do what the Vala authors did would be better error recovery, or maybe their grammar isn't very clean.
I think you will find that most new languages start out with hand-written parsers as the authors get a feel for their own new language and figure out exactly what it is they want to do. Also in some cases as the authors learn how to write compilers. C is a classic example, as is C++. Later on in the evolution a generated parser may be substituted. Compilers for existing standard languages on the other hand can more rapidly be developed via a parser generator, possibly even via an existing grammar: time to market is a critical business parameter of these projects.
Upvotes: 0
Reputation: 17532
I know this isn't going to be definitive, and if your questions weren't specifically Vala-related I wouldn't bother, but since they are...
I wasn't too heavily involved with the project back then so I'm not that clear on some of the details, but the big reason I remember from when Vala switched was dogfooding. I'm not certain it was the primary motivation, but I do remember that it was a factor.
Maintainability was also an issue. That patch replaced a larger parser written in C/Bison/YACC (relatively few people have significant experience with the latter two) with a smaller parser in Vala (which pretty much anyone interested in working on valac probably knows and is comfortable with).
Better error reporting was also a goal, IIRC.
I don't know if it was a factor at all, but the hand-written parser is a recursive descent parser. I know ANTLR generates those, the ANTLR is written in Java, which is a pretty heavy dependency (yes, I know it's not a run-time dependency, but still).
As a side comment: I'm interested in Vala specifically because I like the idea of having language with modern features and clean syntax but compilable into "native" and "unmanaged" high-level language (C in case of Vala). I have found only Vala so far. I'm thinking of having fun by making Vala (or similar language) compilable to C++ (backed by Qt libs). But since I don't want to invent completely new language I'm thinking of taking some existing grammar. Obviously hand-crafted parsers don't have written formal grammar I might reuse. Your comments on this idea are welcome (is the whole idea silly?).
A lot of Vala is really a reflection of decisions made by GObject, and things may or may not work the same way in C++/Qt. If your goal is to replace GObject/C in valac with Qt/C++, you're probably in for more work than you expect. If, however, you just want to make C++ and Qt libraries accessible from Vala, that should certainly be possible. In fact, Luca Bruno started working on that about a year ago (see the wip/cpp branch). It hasn't seen activity for a while due to lack of time, not technical issues.
Upvotes: 2
Reputation: 611
I have written half a dozen hand crafted parsers (in most cases recursive descent parser AKA top-down parser) in my career and have seen parsers generated by parser generators and I must admit I am biased against parser generators.
Here are some pros and cons for each approach.
Parser generators
Pros:
Cons:
Hand crafted recursive descent parser
Pros:
Cons:
It will take some time to hand code the parser especially if you do not have an experience with this stuff.
The parser may be somewhat slow. This applies to all recursive parsers not just hand written ones. Having one function corresponding to each language construct to parse a simple numeric literal the parser may make a dozen or more nested calls starting from e.g. parseExpression through parseAddition, parseMultiplication, etc. parseLiteral. Function calls are relatively inexpensive in a language like C but still cam sum up to a significant time.
One solution to speedup a recursive parser is to replace parts of your recursive parser by a bottom-up sub-parser which often is much faster. The natural candidates for such sub-parser are the expressions which have almost uniform syntax (i.e. binary and unary expressions) with several precedence levels. The bottom-up parser for an expression is usually also simple to hand code, it is often just one loop getting input tokens from the lexer, a stack of values and a lookup table of operator precedence’s for operator tokens.
Upvotes: 29
Reputation: 7153
LR(1) and LALR(1) parsers are really, really annoying for two reasons:
On the other hand, LL(1) grammar are much better at both of these things. The structure of LL(1) grammars makes them very easy to encode as recursive descent parsers, and so dealing with a parser generator is not really a win.
Also, in the case of Vala, the parser and the compiler itself as presented as a library, so you can build a custom backend for the Vala compiler using the Vala compiler library and get all the parsing and type checking and such for free.
Upvotes: 19
Reputation: 15642
According to Vala documentation: "Before 0.3.1, Vala's parser was the classic flex scanner and Bison LALR parser combination. But as of Commit eba85a, the parser is a hand-crafted recursive descent parser." My question is: Why?
Here I'm specifically asking about Vala, although the question could be addressed to any compiler which isn't using parser generator. What are pros and cons for such a move from parser generator to hand-crafted parser? What are disadvantages of using parser generators (Bison, ANTLR) for compilers?
Perhaps the programmers spotted some avenues for optimisation that the parser generator didn't spot, and these avenues for optimisation required an entirely different parsing algorithm. Alternatively, perhaps the parser generator generated code in C89, and the programmers decided refactoring for C99 or C11 would improve legibility.
As a side comment: I'm interested in Vala specifically because I like the idea of having language with modern features and clean syntax but compilable into "native" and "unmanaged" high-level language (C in case of Vala).
Just a quick note: C isn't native. It has it's roots in portability, as it was designed to abstract away those hardware/OS-specific details that caused programmers so much grief when porting. For example, think of the pain of using an entirely different fopen for each OS and/or filesystem; I don't just mean different in functionality, but also different in input and output expectations eg. different arguments, different return values. Similarly, C11 introduces portable threads; Code that uses threads will be able to use the same C11-compliant code to target all OSes that implement threads.
I have found Vala so far. I'm thinking of having fun by making Vala (or similar language) compilable to C++ (backed by Qt libs). But since I don't want to invent completely new language I'm thinking of taking some existing grammar. Obviously hand-crafted parsers don't have written formal grammar I might reuse. Your comments on this idea are welcome (is the whole idea silly?).
It may be feasible to use the hand-crafted parser to produce C++ code with little effort, so I wouldn't toss this option aside so quickly; The old flex/bison parser generator may or may not be more useful. However, this isn't your only option. In any case, I'd be tempted to study the specification extensively.
Upvotes: 0