Caleb Owusu-Yianoma
Caleb Owusu-Yianoma

Reputation: 376

Finding error in JavaCC parser/lexer code

I am writing a JavaCC parser/lexer which is meant to recognise all input strings in the following language L:

A string from L consists of several blocks separated by space characters.
At least one block must be present (i.e., no input consisting only of some number of white spaces is allowed).

The I/O specifications include the following specification:

If the input does represent a string from L, then the word YES must be printed out to System.out, ending with the EOL character.

If the input is not in L, then only a single line with the word NO needs to be printed out to System.out, also ending with the EOL character.
In addition, a brief error message should be printed out on System.err explaining the reason why the input is not in L.

Issue:

This is my current code:

PARSER_BEGIN(Assignment)

  /** A parser which determines if user's input belongs to the langauge L. */
  public class Assignment {
    public static void main(String[] args) {
      try {
        Assignment parser = new Assignment(System.in);
        parser.Input();
        if(parser.Input()) {
          System.out.println("YES"); // If the user's input belongs to L, print YES.
        } else if(!(parser.Input())) {
          System.out.println("NO");
          System.out.println("Empty input");
        }
      } catch (ParseException e) {
        System.out.println("NO");  // If the user's input does not belong to L, print NO.       
      }
    }
  }

PARSER_END(Assignment)

//** A token which matches any lowercase letter from the English alphabet. */
TOKEN :
{
 < ID: (["a"-"z"]) >
}

//* A token which matches a single white space. */
TOKEN : 
{
  <WHITESPACE: " ">
}

/** This production is the basis for the construction of strings which belong to language L. */
boolean Input() :
{}
{
  <ID>(<ID><ID>)* ((<WHITESPACE>(<WHITESPACE><WHITESPACE>)*)<ID>(<ID><ID>)*)* ("\n"|"\r") <EOF>
  {      
    System.out.println("ABOUT TO RETURN TRUE");
    return true;    
  }

  |

  {    
    System.out.println("ABOUT TO RETURN FALSE");
    return false;
  }
}

The issue that I am having is as follows:

I am trying to write code which will ensure that:

At the moment, when I input the string "jjj jjj jjj", which, by definition, is in L (and I follow this with a carriage return and an EOF [CTRL + D]), the text NO Empty input is printed out. I did not expect this to happen.

In an attempt to resolve the issue I wrote the ...TRUE and ...FALSE print statements in my production (see code above). Interestingly enough, I found that when I inputted the same string of js, the terminal printed out the ...TRUE statement once, immediately followed by two occurrences of the ...FALSE statement.
Then the text NO Empty input was printed out, as before.

I have also used Google to try to find out if I am incorrectly using the OR symbol | in my production Input(), or if I am not using the return keyword properly, either. However, this has not helped.

Could I please have hint(s) for resolving this issue?

Upvotes: 1

Views: 330

Answers (2)

sepp2k
sepp2k

Reputation: 370112

You're calling the Input method three times. The first time it will read from stdin until it reaches the end of the stream. This will successfully parse the input and return true. The other two times, the stream will be empty, so it will fail and return false.

You shouldn't call a rule multiple times unless you actually want it to be applied multiple times (which only makes sense if the rule only consumes part of the input rather than going until the end of the stream). Instead when you need the result in multiple places, just call the method once and store the result in a variable.

Or in your case you could just call it once in the if and no variable would even be needed:

Assignment parser = new Assignment(System.in);
if(parser.Input()) {
  System.out.println("YES"); // If the user's input belongs to L, print YES.
} else {
  System.out.println("NO");
  System.out.println("Empty input");
}

Upvotes: 2

Theodore Norvell
Theodore Norvell

Reputation: 16221

When the input is jjj jjj jjj followed by a newline or carriage return (but not both), your main method invokes Parser.Input three times.

  • The first time, your parser consumes all the input and returns true.
  • The second and third times, all the input having already been consumed, the parser returns false.

Once the input is consumed, the lexer will just keep returning <EOF> tokens.

Upvotes: 2

Related Questions