MustafaM
MustafaM

Reputation: 493

Can Bison verify scope as well as syntax?

I am using Lex/Bison to create a simple scripting language which translates into C. A trans-compiler.

I know Bison can verify syntax, but what about scoping? Can it verify the an identifier that is being used was declared before? Does that have to be done manually? And if so, at which step?

E.g. This is syntax-wise correct. But is should not compile because message is in the wrong scope. In addition, message2 was never declared.

String s1
if (s1=='knock on door')
   String message1='Hello';

print message1;
print message2;

Upvotes: 2

Views: 1823

Answers (4)

Wei Zhong
Wei Zhong

Reputation: 590

One way to handle it in Bison without going through another pass after you obtain AST (abstract syntax tree) is,

  • Create a data structure called "symbol table" that supports nested-scope, basically it is a tree of hashtables where each time you pop_scope() and push_scope() to a new scope, you switch to a new sibling node of scope with dead scopes information still saved.
  • At your Bison rules where you create a new scope (so in your example, the if statement), use Bison midrule action feature to invoke push_scope() at "entrance token" (if statement, { block begin, etc.). And invoke pop_scope() at the main action body.

Put in a concrete way, in your language, your original rule may look like:

selection-statement: _IF '(' expression ')' statement {
  /* main action */
}

With midrule action, you can insert an action for _IF, and pop, push scopes at corresponding spot directly in Bison, and it becomes:

selection-statement: _IF { push_scope(); } '(' expression ')' statement {
  /* main action */
  pop_scope();
}

So in your example, at the time symbol message1 is defined, push_scope() is already being called at the point where parser encounters the _IF token when Bison is reducing this rule. And then when you try to read message1 out side of its scope, your symbol table interface should return some error and you should be able to throw error message at Bison using yyerror().

P.S. When you insert a midrule action, you should shift the reference numbers (e.g., $0, $1) for expressions. For a complex rule with midrule actions, this soon gets really messy and hard to read. A useful trick is using Bison named references to avoid mistakenly using wrong reference names.

Upvotes: 0

Ira Baxter
Ira Baxter

Reputation: 95392

Parsers (however produced) typically cannot check scoping rules.

To do scope checking in general, in general you have to build the entire AST and then verify the identifiers are well-scoped. For languages which have static scoping with scopes introduced before variable use, you might be able to do this only the fly in rules. For languages with namespaces, the namespace declaration might occur anywhere, and you can't do it without collecting that namespace and thus the entire program. (The namespace declaration might not even be in the same file as the use; now you have parse an entirely different file).

As a consequence, the usual way scope checking is done is after parsing, over the program tree. Normally if you are going to do anything beyond parsing, you need the tree, and the full symbol table anyway, so this isn't really much of an annoyance.

Some early compilers didn't have the luxury of holding the entire tree in memory. Either they had languages with scope declarations before use, or they were implemented in multiple passes.

Upvotes: 2

MustafaM
MustafaM

Reputation: 493

Found the answer. There is actually a third phase called Semantic Analysis. The steps:

  1. Lex (get tokens)
  2. Parse (make sure token used in correct context)
  3. Semantic Analysis (type checking/scoping)

Semantic Analysis analyzes the parse tree that is created in Step 2.

Upvotes: 1

Mike DeSimone
Mike DeSimone

Reputation: 42825

Bison does not handle identifier scoping and lookup. That functionality is up to you to provide in rule actions.

Upvotes: 0

Related Questions