Reputation: 373
The grammar, in a pinch, is as follows:
Phi ::= Phi_sub ( ("&&" | "||") Phi_sub )*
Phi_sub ::= "(" Phi ")" | ...
Psi ::= Psi_sub ( ("&&" | "||") Psi_sub )*
Psi_sub ::= "(" Psi ")" | ...
Xi ::= LOOKAHEAD( Phi ) Phi | LOOKAHEAD( Psi ) Psi
As you can see, an infinite lookahead would in general be required in the Xi production, because the parser needs to distinguish cases like:
((Phi_sub && Phi_sub) || Phi_sub)
vs ((Psi_sub && Psi_sub) || Psi_sub)
i.e. an arbitrary amount of prefixing (
.
I thought, that making the lookahead like above would work, but it doesn't. For example, Phi is chosen, even if Xi does not expand to Phi, but does to Psi. This can be easily checked on a certain stream S by calling Phi with the debugger just after the parsed decided, within Xi, to choose Phi, and is about to call Phi. The debugger in such a case shows a proper expansion to Psi, while allowing the parser just to call Phi as it wants would cause a parse exception.
The other way of testing it is swapping Phi and Psi:
Xi ::= LOOKAHEAD( Psi ) Psi | LOOKAHEAD( Phi ) Phi
This will make the parser parse the particular S correctly, and so it seems that simply the first branch within Xi is chosen, be it the valid one or not.
I guess I got some basic assumption wrong, but have no idea what can it be. Should the above work in general, if there are no additional factors, like an ignored inner lookahead?
Upvotes: 1
Views: 93
Reputation: 16221
Your assumptions are not wrong. What you are trying to do should work. And it should work for the reasons you think it should work.
Here is a complete example written in JavaCC.
void Start() : {} { Xi() <EOF> }
void Xi() : {} {
LOOKAHEAD( Phi() ) Phi() { System.out.println( "Phi" ) ; }
| LOOKAHEAD( Psi() ) Psi() { System.out.println( "Psi" ) ; }
}
void Phi() : {} { Phi_sub() ( ("&&" | "||") Phi_sub() )*}
void Phi_sub() : {} { "(" Phi() ")" | "Phi_sub" }
void Psi() : {} { Psi_sub() ( ("&&" | "||") Psi_sub() )* }
void Psi_sub() : {} { "(" Psi() ")" | "Psi_sub" }
And here is some sample output:
Input is : <<Phi_sub>>
Phi
Input is : <<Psi_sub>>
Psi
Input is : <<((Phi_sub && Phi_sub) || Phi_sub)>>
Phi
Input is : <<((Psi_sub && Psi_sub) || Psi_sub)>>
Psi
The problem you are having lies in something not shown in the question.
By the way, it's a bad idea to put a lookahead specification in front of every alternative.
void X() : {} { LOOKAHEAD(Y()) Y() | LOOKAHEAD(Z()) Z() }
is roughly equivalent to
void X() : {} { LOOKAHEAD(Y()) Y() | LOOKAHEAD(Z()) Z() | fail with a stupid error message }
For example, here is another run of the above grammar
Input is : <<((Psi_sub && Psi_sub) || Phi_sub)>>
NOK.
Encountered "" at line 1, column 1.
Was expecting one of:
After all lookahead has failed, the parser is left with an empty set of expectations!
If you change Xi
to
void Xi() : {} {
LOOKAHEAD( Phi() ) Phi() { System.out.println( "Phi" ) ; }
| Psi() { System.out.println( "Psi" ) ; }
}
you get a slightly better error message
Input is : <<((Psi_sub && Psi_sub) || Phi_sub)>>
NOK.
Encountered " "Phi_sub" "Phi_sub "" at line 1, column 26.
Was expecting one of:
"(" ...
"Psi_sub" ...
You can also make a custom error message
void Xi() : {} {
LOOKAHEAD( Phi() ) Phi() { System.out.println( "Phi" ) ; }
| LOOKAHEAD( Psi() ) Psi() { System.out.println( "Psi" ) ; }
| { throw new ParseException( "Expected either a Phi or a Psi at line "
+ getToken(1).beginLine
+ ", column " + getToken(1).beginColumn + "." ) ;
}
}
Upvotes: 1