Roel
Roel

Reputation: 19642

Questions about grammar / parser theory

I have recently finished my first parser/compiler for a special-purpose programming language. I had some freedom in the specification of this language and in some places I tweaked the spec to make it easier to parse; which has, in some places, resulted in some (so far) minor points for improvement in the language itself. Maybe I'll get to fix those in some future v2; for now I'm still digesting the things I've learned from this process and I have a few questions about this. I never took a formal course on compiler design so I want verify that my understanding is in line with the state of the art in this field.

Upvotes: 3

Views: 351

Answers (1)

Paolo Capriotti
Paolo Capriotti

Reputation: 4072

  1. I think you are confusing the terminology a little. Parsing only deals with going from source to AST, and is divided in "stages", not "passes". Using the word "pass" implies that they are performed sequentially, which is in general not true.

    The reason why in practice lexical analysis (i.e. tokenization) and actual parsing are intertwined is because of syntactic peculiarities in the target language: the syntax of many real languages is better described by defining "context sensitive" tokens. Think of languages with string interpolation, or user definable operators.

  2. The parse table of an LL(k) grammars grows exponentially with k in the worst case, I suppose that's what got them a bad name (for k > 1). If you are manually implementing a recursive descent parser, the lookahead doesn't really matter. Backtracking is still bad, however, since it can cause exponential runtime.

  3. You can change a grammar without affecting the target language. Ensuring that a grammar falls within a certain class allows you to use existing tools (like LALR parser generators), and gives you some information on the runtime characteristics of the parser. I do agree that language design shouldn't be guided by parsing considerations, although a little common sense does help or you'll end up with an unparsable jumble like C++.

Upvotes: 5

Related Questions