Reputation: 1345
I'm trying to make a lexical analyzer class, that mostly tokenizes the input stream characters, and I use System.in.read()
to read characters. The doc says that it returns -1
when end of stream is reached, but, how is this behaviour different when it has different input, I cannot understand this. For e.g. delete.txt
has the input:
1. I have
2. bulldoz//er
Then the Lexer
has correct tokenization as:
[I=257, have=257, false=259, er=257, bulldoz=257, true=258]
but now if I insert some blank lines using enter
then, the code goes on an infinite loop, the code checks newlines and spaces for input, yet, how does it get bypassed? :
1. I have
2. bulldoz//er
3.
The full code is:
package lexer;
import java.io.*;
import java.util.*;
import lexer.Token;
import lexer.Num;
import lexer.Tag;
import lexer.Word;
class Lexer{
public int line = 1;
private char null_init = ' ';
private char tab = '\t';
private char newline = '\n';
private char peek = null_init;
private char comment1 = '/';
private char comment2 = '*';
private Hashtable<String, Word> words = new Hashtable<>();
//no-args constructor
public Lexer(){
reserve(new Word(Tag.TRUE, "true"));
reserve(new Word(Tag.FALSE, "false"));
}
void reserve(Word word_obj){
words.put(word_obj.lexeme, word_obj);
}
char read_buf_char() throws IOException {
char x = (char)System.in.read();
return x;
}
/*tokenization done here*/
public Token scan()throws IOException{
for(; ; ){
// while exiting the loop, sometime the comment
// characters are read e.g. in bulldoz//er,
// which is lost if the buffer is read;
// so read the buffer i
peek = read_buf_char();
if(peek == null_init||peek == tab){
peek = read_buf_char();
System.out.println("space is read");
}else if(peek==newline){
peek = read_buf_char();
line +=1;
}
else{
break;
}
}
if(Character.isDigit(peek)){
int v = 0;
do{
v = 10*v+Character.digit(peek, 10);
peek = read_buf_char();
}while(Character.isDigit(peek));
return new Num(v);
}
if(Character.isLetter(peek)){
StringBuffer b = new StringBuffer(32);
do{
b.append(peek);
peek = read_buf_char();
}while(Character.isLetterOrDigit(peek));
String buffer_string = b.toString();
Word reserved_word = (Word)words.get(buffer_string);//returns null if not found
if(reserved_word != null){
return reserved_word;
}
reserved_word = new Word(Tag.ID, buffer_string);
// put key value pair in words hashtble
words.put(buffer_string, reserved_word);
return reserved_word;
}
// if character read is not a digit or a letter,
// then the character read is a new token
Token t = new Token(peek);
peek = ' ';
return t;
}
private char get_peek(){
return (char)this.peek;
}
private boolean reached_buf_end(){
// reached end of buffer
if(this.get_peek() == (char)-1){
return true;
}
return false;
}
public void run_test()throws IOException{
//loop checking variable
//a token object is initialized with dummy value
Token new_token = null;
// while end of stream has not been reached
while(this.get_peek() != (char)-1){
new_token = this.scan();
}
System.out.println(words.entrySet());
}
public static void main(String[] args)throws IOException{
Lexer tokenize = new Lexer();
tokenize.run_test();
}
}
The get_peek
function gets the value of peek
which has current input buffer character.
The check for if the buffer end is reached is done in the run_test
function.
The main processing is done in the scan()
function.
I used the following command: cat delete.txt|java lexer/Lexer
to provide the file as input to the compiled java class. Please tell me how is it that this code with the input file with newline added is going on an infinite loop?
Upvotes: 0
Views: 110
Reputation: 44
I am not sure how you are checking for the end of stream (-1). At the end of scan() you are assigning "peek" to space, I think this is messing up when you have a blank line, you are not able to catch -1.
Upvotes: 1