What is the highest stage in compilation, where a "name" can be entered into the symbol table?

Question

Below is an excerpt from the red dragon book.

The symbol-table entry itself can be set up when the role of a name becomes clear, with the attribute values being filled in as the information becomes available.

In some cases, the entry can be initiated from the lexical analyzer as soon as a name is seen in the input.

More often, one name may denote several different objects, perhaps even in the same block or procedure. For example, the C declarations:
   int x; 
   struct x { float y, z; };          //(7.1)
use x as both an integer and as the tag of a structure with two fields. In such cases, the lexical analyzer can only return to the parser the name itself (or a pointer to the lexeme forming that name), rather than a pointer to the symbol-table entry.

The record in the symbol table is created when the syntactic role played by this name is discovered. For the declarations in (7.1), two symbol-table entries for x would be created; one with x as an integer and one as a structure.

In the introductory chapters of the text, I had learnt a technique, where the identifies were entered into the symbol table as when they were encountered by the lexical analyzer.( Similar to what is said in point 2 in the excerpt).

But what is said in point 3, I am unaware of it. It says that the entry shall be made only when the syntactic role of the declaration is made clear. In that case the name has to be entered only after the syntax analysis is completed.

But in doing SDT for adding types to the symbol table with grammars like this:

It was assumed that the lexeme entry is already present in the symbol table and we add type to it.

I am getting confused. If symbol table entries are to happen after syntax analysis then how can the SDT be integrated with the syntax analysis ?

rici · Accepted Answer

I'll take on the particular case of C, since it's hard to answer questions about hypothetical situations in hypothetical languages. Still, the concrete issues raised by parsing C are probably indicative of the kind of strategy you might use to handle other concrete issues. In any event, much of the content of a textbook like the Dragon book (or any other text on compiler construction) needs to be read as an exploration of possibilities rather than as a recipe-book. If there were a few recipes which could be copied, the topic would be much less interesting.

Not counting macros (which are a totally different discussion), C has three general namespaces plus a namespace for each composite type (that is, structure and unions). These are listed in §6.2.3 of the C standard:

statement labels.
tags used to identify struct, union and enum types.
ordinary identifiers, which can be variable names or type aliases.

These namespaces are completely separate; the use of a name for a label or the tag of a union type does not have any relationship whatsoever with its use as a variable or a type alias, and neither of those have any relationship with the use of the same identifier token to name members of one or more aggregate types. Although it's very simple to decide which namespace a particular token should be looked up in, doing so requires a some context (quite a bit in the case of member names) and its not very likely that the work will be done in the lexical scanner. So it's reasonable to assume that the lexical scanner will intern identifiers in some kind of catalog of names (to avoid unnecessary copying of identifier strings), and leave it to the parser to use the identifier as it sees fit.

In most cases, the namespace can be almost immediately resolved, and the actual symbol table entry will be created in the appropriate syntax table through an attribute rule in a syntax production (or the moral equivalent in compilers which are not strictly syntax-driven).

For example, an aggregate tag always follows the token enum, struct or union in a type declaration, while a label is always followed by a : and a statement. The relevant productions are unambiguous and relatively low-level, and a newly-created identifier can easily be entered into the correct symbol table before any possibility of another use of the same identifier in the same namespace.

Here's a part of the C grammar:

type-specifier
    : ... (lots of keywords like `int`, `double`, etc.)
    | struct-or-union-specifier
    | enum-specifier
    | ...
struct-or-union-specifier
    : struct-or-union identifier
    | struct-or-union identifier '{' struct-declaration-list '}'
    | struct-or-union '{' struct-declaration-list '}'
struct-or-union
    : "struct"
    | "union"
...

In order to correctly handle the identifier in the first two productions for struct-or-union-specifier, we need to look the identifier up in the symbol table for aggregate tags and set some attribute to that symbol table entry. We also need to deal with untagged struct and union types, which will also need a symbol table entry to hold data about the aggregate type. (Calling these symbol table entries is a bit misleading since untagged types are never actually entered into any symbol table but the other information about the type is the same whether or not the type is tagged, so it makes sense to keep it in a SymbolTableEntry data object.)

It's legal to use the aggregate tag as soon as it has been entered; in particular, as part of a declaration of a member in the struct-declaration-list. So we're going to need to look up or create the entry in the middle of the struct-declaration-list production, just before the { token. But there are lots of inherited attributes of this form, and there's nothing difficult about the operation. We just add an inherited attribute rule:

struct-or-union-specifier
    : struct-or-union identifier 
                     { struct-or-union.entry := new_tag_symbol(identifier.string); }
      '{' struct-declaration-list '}'
                     { /* Attach the definitions in struct-declaration-list
                          to the aggregate type entry in struct-or-union.entry
                        */
                     }

(The rule for untagged creates a new unnamed aggregate type instead of trying to enter the name in a symbol table, but the principle is no different.)

This will be very similar to the way labels are entered in the label symbol table (which is always scoped to the current function, unlike any other symbol table. I'm not going to complicate the outline with scopes, though.):

labeled-statement:  identifier
                           { labeled-statement.label := define_label_name(identifier.string); }
                    ':' statement

It seems like we need to create the attribute in the middle of the rule, and I did that in the example above, because it's legal (although obviously bad style) for a statement to use its own label:

    again: if (try() != SUCCEED) { goto again; }

However, it's not really required because C does not require that labels be declared before use. (In fact, it provides no mechanism for declaring a label without attaching it to a statement.) So the parser needs to be able to deal with as-yet-undefined labels anywhere a label is used. The most likely solution is to look up or enter the label into the label symbol table in an attribute rule for jump-statement.

None of that seems particularly complicated. I hope you agree.

What might be complicated is adding members to a newly-created aggregate type. The declarations of members don't look any different from the declaration of a local or global variable; the only difference is that the member declarations are parsed inside a struct-declaration-list. As shown above, the symbol table for the aggregate type has already been created (in new_tag_symbol), and is accessible through inheritance from the struct-declaration-list. How this chain of inherited attributes eventually arrives at the declaration is going to vary a lot from compiler to compiler, but the standard techniques for implementing inherited attribute chains will be applicable.

What is the highest stage in compilation, where a "name" can be entered into the symbol table?

Answers (1)

Related Questions

What is the highest stage in compilation, where a &quot;name&quot; can be entered into the symbol table?

Answers (1)

Related Questions

What is the highest stage in compilation, where a "name" can be entered into the symbol table?