Reputation: 376
I am creating a lexer/parser which should accept strings that belong to an infinite set of languages.
One such string is "a <2L>AA <2U>a <2L>AA <2U>a</2U></2L></2U></2L>"
.
The set of languages is defined as follows:
Base language, L0
Example of string belonging to L0:
zyx abcba m xyzvv
There is one space character between zyx
and abcba
, there are three spaces
between abcba
and m
, and only one between m
and xyzvv
. No other space characters are present in the string.
Language L1
<2U>. . .</2U>
, where . . .
stands
for any string from L0.Example of string belonging to L1:
YZ <2U>abc zzz</2U> ABBA <2U>kkkkk</2U> KM
Note that five spaces separate YZ
and <2U>abc zzz</2U>
, and three spaces divide abc
from zzz
. Otherwise single spaces are used as separators. There is no space in front of YZ
and no space follows KM
.
Language L2
<2L>. . .</2L>
, where . . .
stands
for any string from L1.Example of string belonging to L2:
abc <2L>AA ZZ <2U>a bcd</2U></2L> z <2L><2U>abcde</2U></2L>
Single spaces are used as separators inside the sentence given above, but any other odd number of spaces would also lead to a valid L2 sentence.
Languages L{2k + 1}, k > 0
<2U>. . .</2U>
, where . . .
stands
for any string from L{2k}.Languages L{2k + 2}, k > 0
<2L>. . .</2L>
, where . . .
stands
for any string from L{2k + 1}.The code for my lexer/parser is as follows:
PARSER_BEGIN(Assignment)
/** A parser which determines if user's input belongs to any one of the set of acceptable languages. */
public class Assignment {
public static void main(String[] args) {
try {
Assignment parser = new Assignment(System.in);
parser.Start();
System.out.println("YES"); // If the user's input belongs to any of the set of acceptable languages, then print YES.
} catch (ParseException e) {
System.out.println("NO"); // If the user's input does not belong to any of the set of acceptable languages, then print NO.
}
}
}
PARSER_END(Assignment)
//** A token which matches any lowercase letter from the English alphabet. */
TOKEN :
{
< #L_CASE_LETTER: ["a"-"z"] >
}
//* A token which matches any uppercase letter from the English alphabet. */
TOKEN:
{
< #U_CASE_LETTER: ["A"-"Z"] >
}
//** A token which matches an odd number of lowercase letters from the English alphabet. */
TOKEN:
{
< ODD_L_CASE_LETTER: <L_CASE_LETTER>(<L_CASE_LETTER><L_CASE_LETTER>)* >
}
//** A token which matches an even number of uppercase letters from the English alphabet. */
TOKEN:
{
< EVEN_U_CASE_LETTERS: (<U_CASE_LETTER><U_CASE_LETTER>)+ >
}
//* A token which matches the string "<2U>" . */
TOKEN:
{
< OPEN_UPPER: "<2U>" >
}
//* A token which matches the string "</2U>". */
TOKEN:
{
< CLOSE_UPPER: "</2U>" >
}
//* A token which matches the string "<2L>". */
TOKEN:
{
< OPEN_LOWER: "<2L>" >
}
//* A token which matches the string "</2L>". */
TOKEN:
{
< CLOSE_LOWER: "</2L>" >
}
//* A token which matches an odd number of white spaces. */
TOKEN :
{
< ODD_WHITE_SPACE: " "(" "" ")* >
}
//* A token which matches an EOL character. */
TOKEN:
{
< EOL: "\n" | "\r" | "\r\n" >
}
/** This production matches strings which belong to the base language L^0. */
void Start() :
{}
{
LOOKAHEAD(3)
<ODD_L_CASE_LETTER> (<ODD_WHITE_SPACE> <ODD_L_CASE_LETTER>)* <EOL> <EOF>
|
NextLanguage()
|
LOOKAHEAD(3)
NextLanguageTwo()
|
EvenLanguage()
}
/** This production matches strings which belong to language L^1. */
void NextLanguage():
{}
{
(<OPEN_UPPER> (PseudoStart()) <CLOSE_UPPER>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())* <EOL> <EOF>
|
(<EVEN_U_CASE_LETTERS>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())* <EOL> <EOF>
}
/** This production matches either an even number of uppercase letters, or a string from L^0, encased within the tags <2U> and </2U>. */
void UpperOrPseudoStart() :
{}
{
<EVEN_U_CASE_LETTERS>
|
<OPEN_UPPER> (PseudoStart()) <CLOSE_UPPER>
}
/** This production matches strings from L^0, in a similar way to Start(); however, the strings that it matches do not have EOL or EOF characters after them. */
void PseudoStart() :
{}
{
<ODD_L_CASE_LETTER> (<ODD_WHITE_SPACE> <ODD_L_CASE_LETTER>)*
}
/** This production matches strings which belong to language L^2. */
void NextLanguageTwo() :
{}
{
(<ODD_L_CASE_LETTER>)+ (<ODD_WHITE_SPACE> LowerOrPseudoNextLanguage())* <EOL> <EOF>
|
(<OPEN_LOWER> PseudoNextLanguage() <CLOSE_LOWER>)+ (<ODD_WHITE_SPACE> LowerOrPseudoNextLanguage())* <EOL> <EOF>
}
/** This production matches either an odd number of lowercase letters, or a string from L^1, encased within the tags <2L> and </2L>. */
void LowerOrPseudoNextLanguage() :
{}
{
<ODD_L_CASE_LETTER>
|
<OPEN_LOWER> PseudoNextLanguage() <CLOSE_LOWER>
}
/** This production matches strings from L^1, in a similar way to NextLanguage(); however, the strings that it matches do not have EOL or EOF characters after them. */
void PseudoNextLanguage() :
{}
{
(<OPEN_UPPER> (PseudoStart()) <CLOSE_UPPER>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())*
|
(<EVEN_U_CASE_LETTERS>)+ (<ODD_WHITE_SPACE> UpperOrPseudoStart())*
}
/** This production matches strings which belong to any of the languages L^{2k + 2}, where k > 0 (the infinite set of even languages). */
void EvenLanguage() :
{}
{
(<ODD_L_CASE_LETTER>)+ (<ODD_WHITE_SPACE> EvenLanguageAuxiliary())* <EOL> <EOF>
|
(CommonPattern())+ (<ODD_WHITE_SPACE> EvenLanguageAuxiliary())* <EOL> <EOF>
}
/** This production is an auxiliary production that helps when parsing strings from any of the even set of languages. */
void EvenLanguageAuxiliary() :
{}
{
CommonPattern()
|
<ODD_L_CASE_LETTER>
}
void CommonPattern() :
{}
{
<OPEN_LOWER> <EVEN_U_CASE_LETTERS> <ODD_WHITE_SPACE> <OPEN_UPPER> <ODD_L_CASE_LETTER> (<ODD_WHITE_SPACE> CommonPattern())+ <CLOSE_UPPER> <CLOSE_LOWER>
}
Several times now, I have inputted the string "a <2L>AA <2U>a <2L>AA <2U>a</2U></2L></2U></2L>"
.
However, each time, NO
is printed out on the terminal.
I have looked through my code carefully several times, checking the order in which I think the input string should be parsed; but, I haven't been able to find any errors in my logic or reasons why the string isn't being accepted.
Could I have some suggestions as to why it isn't being accepted, please?
Upvotes: 0
Views: 484
Reputation: 376
The following steps helped to solve the problem.
javacc -debug_parser Assignment.jj
javac Assignment*.java
java Assignment
) and then input the string:"a <2L>AA <2U>a <2L>AA <2U>a</2U></2L></2U></2L>"
NextLangaugeTwo()
is called on this string, rather than the desired EvenLanguage()
production. NextLangaugeTwo()
shows that it matches the first eight tokens in the input string.Start()
production by changing the second lookahead value (just above the call to NextLanguageTwo()
) from 3 to 9. Upvotes: 1
Reputation: 462
Are any of your inputs being accepted? I have copied your code over to my computer and have found that any correct input (as far as I can tell from the definition of your language), it always outputs 'NO'.
Upvotes: 0