arataj
arataj

Reputation: 373

javacc' LOOKAHEAD( AllSymbols() ) AllSymbols() not chosen, sole to be parsed correctly

The grammar, in a pinch, is as follows:

Phi ::= Phi_sub ( ("&&" | "||") Phi_sub )*
Phi_sub ::= "(" Phi ")" | ...

Psi ::= Psi_sub ( ("&&" | "||") Psi_sub )*
Psi_sub ::= "(" Psi ")" | ...

Xi ::= LOOKAHEAD( Phi ) Phi | LOOKAHEAD( Psi ) Psi

As you can see, an infinite lookahead would in general be required in the Xi production, because the parser needs to distinguish cases like:

((Phi_sub && Phi_sub) || Phi_sub) vs ((Psi_sub && Psi_sub) || Psi_sub)

i.e. an arbitrary amount of prefixing (.

I thought, that making the lookahead like above would work, but it doesn't. For example, Phi is chosen, even if Xi does not expand to Phi, but does to Psi. This can be easily checked on a certain stream S by calling Phi with the debugger just after the parsed decided, within Xi, to choose Phi, and is about to call Phi. The debugger in such a case shows a proper expansion to Psi, while allowing the parser just to call Phi as it wants would cause a parse exception.

The other way of testing it is swapping Phi and Psi:

Xi ::= LOOKAHEAD( Psi ) Psi | LOOKAHEAD( Phi ) Phi

This will make the parser parse the particular S correctly, and so it seems that simply the first branch within Xi is chosen, be it the valid one or not.

I guess I got some basic assumption wrong, but have no idea what can it be. Should the above work in general, if there are no additional factors, like an ignored inner lookahead?

Upvotes: 1

Views: 93

Answers (1)

Theodore Norvell
Theodore Norvell

Reputation: 16221

Your assumptions are not wrong. What you are trying to do should work. And it should work for the reasons you think it should work.


Here is a complete example written in JavaCC.

void Start() : {} { Xi() <EOF> }

void Xi() : {} {
    LOOKAHEAD( Phi() ) Phi() { System.out.println( "Phi" ) ; }
|   LOOKAHEAD( Psi() ) Psi() { System.out.println( "Psi" ) ; }
}

void Phi() : {} { Phi_sub() ( ("&&" | "||") Phi_sub() )*}

void Phi_sub() : {} { "(" Phi() ")" | "Phi_sub" }

void Psi() : {} { Psi_sub() ( ("&&" | "||") Psi_sub() )* }

void Psi_sub() : {} { "(" Psi() ")" | "Psi_sub" }

And here is some sample output:

Input is : <<Phi_sub>>
Phi
Input is : <<Psi_sub>>
Psi
Input is : <<((Phi_sub && Phi_sub) || Phi_sub)>>
Phi
Input is : <<((Psi_sub && Psi_sub) || Psi_sub)>>
Psi

The problem you are having lies in something not shown in the question.


By the way, it's a bad idea to put a lookahead specification in front of every alternative.

void X() : {} { LOOKAHEAD(Y()) Y() | LOOKAHEAD(Z()) Z() }

is roughly equivalent to

void X() : {} { LOOKAHEAD(Y()) Y() | LOOKAHEAD(Z()) Z() | fail with a stupid error message }

For example, here is another run of the above grammar

Input is : <<((Psi_sub && Psi_sub) || Phi_sub)>>
NOK.
Encountered "" at line 1, column 1.
Was expecting one of:

After all lookahead has failed, the parser is left with an empty set of expectations!

If you change Xi to

void Xi() : {} {
    LOOKAHEAD( Phi() ) Phi() { System.out.println( "Phi" ) ; }
|   Psi() { System.out.println( "Psi" ) ; }
}

you get a slightly better error message

Input is : <<((Psi_sub && Psi_sub) || Phi_sub)>>
NOK.
Encountered " "Phi_sub" "Phi_sub "" at line 1, column 26.
Was expecting one of:
    "(" ...
    "Psi_sub" ...

You can also make a custom error message

void Xi() : {} {
    LOOKAHEAD( Phi() ) Phi() { System.out.println( "Phi" ) ; }
|   LOOKAHEAD( Psi() ) Psi() { System.out.println( "Psi" ) ; }
|   { throw new ParseException( "Expected either a Phi or a Psi at line "
                               + getToken(1).beginLine
                               + ", column " + getToken(1).beginColumn + "." ) ; 
    }
}

Upvotes: 1

Related Questions