Prasanna
Prasanna

Reputation: 1751

What values are stored into Symbol table in compiler construction

In Compiler Construction by Aho Ullman and Sethi, it is given that the input string of characters of the source are read by scanner(lexical analysis) and groups characters into meaningful sequences called lexems,and for each lexeme scanner produces output as a token of the form. like below <token-name, attribute-value>

  e.g position = initial + rate * 60 

these characters are group grouped into lexemes and mapped into tokens like

  1. position is lexeme and mapped into token as <id, 1> where id is an abstract symbol for identifier and 1 points to the symbol table entry for position.
  2. initial is lexeme and mapped into token <id, 2>, where 2 points to symbol table entry for initial

my question is, how these tokens are stored into symbol table? as we are only mapping lexemes into tokens like <id , 1>, <id, 2>..etc. where are we storing values corresponding to these tokens in symbol table? I am aware of the symbol table but, can somebody please tell me the signature of ST which is used here? Is it something like <id, map<token-name, attribute-value>> ?? also for all id fields(identifiers) which data-structure is being used to store information related to identifiers like name, scope, size, dataType.

And which state ST is generated? because all stages(scanner, parser, semantic analyzer etc) in compiler design uses ST for reference

Another question is when parser asks for next input token then does the scanner reads input token from ST or from input data? Please help me to understand or attribute-value is simply contains the pointer to the symbol table?

Upvotes: 2

Views: 769

Answers (1)

rici
rici

Reputation: 241671

During the lexical scan, the only information you have about a symbol is its spelling. So you can't do much more than intern the symbol to avoid multiple dynamic allocation of the symbol's name. (How useful this is depends a lot on your implementation language.)

As the analysis continues, you will accumulate more information about each symbol. In most programming languages, the same name will be associated with multiple objects: some of the associations will be scoped (local variables) while others will be contextual (namespaces and aggregate members, for example). The precise meaning of each lexeme will need to be resolved, but that might not happen even during the initial syntactic parse. (For example, the name of a structure member will need to be associated with the actual member in the object which describes the structure's type, but until you've resolved the type of each expression, you won't know what the structure type is.)

So there is no one answer to this question. There will likely be a lot of different containers in your compiler which associate a name with some collection of information, and they are not likely to all have the same data fields. All that will have to be fleshed out as you write the various phases of your compiler.

Upvotes: 4

Related Questions