Reputation: 376
I am writing a JavaCC parser/lexer which is meant to recognise all input strings belonging to either of the following languages L0 or L1:
L0
A string from L0 consists of several blocks separated by space characters.
At least one block must be present (i.e., no input consisting only of some number of white spaces is allowed).
L1
A string from L1 consists of several blocks separated by space characters.
At least one block must be present.
(A-Z)
. A block of the
second kind must have the shape <2U>. . .</2U>
, where . . .
stands
for any string from L0.This is my code so far:
PARSER_BEGIN(Assignment)
/** A parser which determines if user's input belongs to L0 or L1. */
public class Assignment {
public static void main(String[] args) {
String returnString = null;
boolean toPrintEmptyInput = false;
try {
Assignment parser = new Assignment(System.in);
if(parser.Input()) {
System.out.println("YES"); // If the user's input belongs to L0, print YES.
} else {
System.out.println("NO");
}
} catch (ParseException e) {
System.out.println("NO"); // If the user's input does not belong to L0, print NO.
}
}
}
PARSER_END(Assignment)
//** A token which matches any lowercase letter from the English alphabet. */
TOKEN :
{
< IDLOWER: (["a"-"z"]) >
}
//* A token which matches any uppercase letter from the English alphabet. */
TOKEN:
{
< IDUPPER: (["A"-"Z"]) >
}
//* A token which matches a single white space. */
TOKEN :
{
<WHITESPACE: " ">
}
/** This production is the basis for the construction of strings which belong to language L0. */
boolean Input() :
{}
{
<IDLOWER>(<IDLOWER><IDLOWER>)* ((<WHITESPACE>(<WHITESPACE><WHITESPACE>)*)<IDLOWER>(<IDLOWER><IDLOWER>)*)* ("\n"|"\r") <EOF>
{
return true;
}
|
{
return false;
}
}
/** This production is the basis for the construction of strings which belong to language L1. */
void Input2() :
{}
{
Input() ((<WHITESPACE> Input())* (<WHITESPACE> (<IDUPPER><IDUPPER>)+)*)* ("\n"|"\r") <EOF>
|
(<IDUPPER><IDUPPER>)+ ((<WHITESPACE> (<IDUPPER><IDUPPER>)+)* (<WHITESPACE> Input())*)* ("\n"|"\r") <EOF>
}
Issue:
The issue that I am having is that, when I run javacc
on Assignment.jj
, the following is printed out on the terminal: Expansion within "(. . .)*" can be matched by empty string.
I have looked at the following links, in order to try to better understand this error:
The second link recommended modifying the . . .
within the expansion, such that it cannot be matched by zero. However, I am struggling to do this while still having a production which accepts strings in L1.
I would appreciate hints or corrections!
Upvotes: 1
Views: 1641
Reputation: 241691
In the rule for Input2()
, the pattern contained inside (...)*
in:
((<WHITESPACE> Input())* (<WHITESPACE> (<IDUPPER><IDUPPER>)+)*)*
could be matched by the empty string.
You can reduce the expansion to the form (A* B*)*
, where A
is <WHITESPACE> Input()
and B
is <WHITESPACE> (<IDUPPER><IDUPPER>)+
, and A* B*
can match the empty string, regardless of what A
and B
are.
JavaCC does not allow (...)*
expansions if the enclosed expression can match the empty string, which is what the error message is trying to tell you.
A reasonable alternative might be:
(A | B)*
Since in this case both A
and B
start with <WHITESPACE>
, it will be necessary to factor it:
(<WHITESPACE> ( Input() | (<IDUPPER><IDUPPER>)+ ) )*
Upvotes: 3