user207093
user207093

Reputation: 21

sequence of input tokens for syntax analyser

In compiler, first phase create the tokens and send it to the syntax analyzer.

I am confused that they syntax analyzer start its process after getting all tokens or start with some of the tokens.

consider the following program containing two functions

void fun()
{
printf("world");
}
int main()
{
printf("hello");
return 0;
}

now the above is tokenized like the below

<keyword, 1>
<identifier, 2>
<symbols, {>
<symbols, (>
<symbols, )>
<keyword, 3> 
<symbols, (>
<string, 4>
<symbols, )>
<punctuation, ;>
<symbols, }>

<keyword, 5>
<keyword, 6>
<symbols, {>
<symbols, (>
<symbols, )>
<keyword, 7> 
<symbols, (>
<string, 8>
<symbols, )>
<punctuation, ;>
<keyword, 9> 
<constant,10>
<symbols, }>

Now Tokens will be passed to the parser.

In this scenario, parser needs both functions tokens or it will complete first and go for second. Also on what sequence tokens will be passed to parser? In case it has to go in sequence that means is lexical analyzer need to maintain the sequence of tokens?

Another doubt is the parse tree created for the whole program or for the function or for the compound statements.

Can anyone answer the above questions?

Upvotes: 2

Views: 463

Answers (2)

It is not crystal clear to me what you're unsure about, but generally a compiler can go either way:

The lexer can run up front, tokenizing the entire input file, and then pass the list of tokens to the parser. That's the mode the parser generator that I happen to use employs.

However, the more common approach is actually that the parser calls the lexer for new tokens as it needs them, and for most parser systems there's also some sort of "unget" command so that the parser can push back a token to the lexer in case it went down the wrong path in the grammar.

For compiler systems that generate parse trees, the parse tree typically encompasses everything in a source file. Of course within that parse tree there will be self-contained sub-trees for functions and within those compound statement sub-trees, and so on, depending on the language being parsed.

Upvotes: 1

Anil Kumar
Anil Kumar

Reputation: 348

At present i am constructing a compiler.

The scanner object(scanner divides the source file into tokens) is provided to the parser.And in scanner there should be a method to get nexttoken. And the parser calls scanner.getnexttoken() whenever it required. for example Scanner class looks like this.

public class Scanner {
public TokenType getToken()
    {
       //Logic to divide text into tokens.

       return Token;
    }
}

Now let us see from the Parser side.I'll explain with an example. Let us parse the declaration int x; The grammar rule is variable-declaration ----> type-specifier IDENTIFIER SEMI type-specifier can be int or void .

public class Parser {
    private Scanner scanner;
  parse(Scanner scanr)
    {
        scanner=scanr;
        parseDeclaration();
    }

   boolean  parseDeclaration() 
   {
      current_token= scanner.getToken();//should be int or void.
     if(current_token == INT || current_token == VOID)
     {
        current_token= scanner.getToken();//should be identifier.
        if(current_token == IDENTIFIER){
            current_token= scanner.getToken();//should be SEMICOLON.
             if(current_token == SEMICOLON)
                     return true;

       else{      //if there is a syntax error.
            return false;
       }

    }

The above parser parses only declarations like

  1. int x;
  2. int y;

and returns true if it is correct or false otherwise. And regarding the parse tree,the construction of parse tree greatly helps during the following phases.you are basically passing your source file as a tree structure after removing some unnecessary tokens during syntax analysis. So the syntax tree should be constructed for the whole program.If you want further technical details you can comment on this.I will be too glad to reply.

Upvotes: 0

Related Questions