Tyler Klement
Tyler Klement

Reputation: 41

Basic recursive descent parser for the set of closed parentheses

How do I make a recursive descent LL(1) parser (I'm using C++) for the following grammar:

S -> SS  
S -> (S)  
S -> epsilon

I've made an attempt using recursive functions, but I think it's completely wrong. I know that the rules need to be implemented in order, but how do I get the first rule to check for other instances of the first rule without going on an infinite loop? In my code I have it only check for subsequent rules.

My code:

bool rule1(tokenizer * tp){ // S -> SS
    return rule2(tp) && rule2(tp);
}

bool rule2(tokenizer * tp){ // S -> (S)
    bool returnVal = false;

    if(tp-> lookahead. front( ). type == tkn_LPAR){
        tokenizer tmpTknzr = &tp;       
        tp-> lookahead. pop_front( );
        if(tp-> lookahead. front( ). type == tkn_RPAR){ // S -> epsilon
            tp-> lookahead. pop_front( );
            returnVal = true;
        } else {
            if(evalS(tp) && tp-> lookahead. front( ). type == tkn_RPAR){
                tp-> lookahead. pop_front( );
                returnVal = true;
            }
        }
    }
    return returnVal;
}

bool evalS(tokenizer * tp){ // evaluate S
    return rule1(tp) || rule2(tp);
}

Upvotes: 1

Views: 769

Answers (2)

Adrian McCarthy
Adrian McCarthy

Reputation: 47954

The 1 in LL(1) means that you must be able to select the current rule to expand by looking ahead at no more than 1 token.

With the grammar as you've written it, you cannot determine which rule to expand when you look at the first token. If it's a '(', then you don't know if it's rule 2 or rule 1 where the first S expanded to an epsilon. That's why you can get stuck in infinite recursion.

The trick is to transform your grammar into a form where each rule that can apply to non-terminal X begins with a distinct terminal. Chris Dodd's answer showed one way to do that for your grammar.

Upvotes: 1

Chris Dodd
Chris Dodd

Reputation: 126193

The problem is that your grammar is not LL(1) -- in fact it is ambiguous, which makes parsing much harder. Try using an LL(1) grammar for parenthesis, like

S -> (S)S
S -> ε

Upvotes: 2

Related Questions