Reputation: 1857
I'm using Java's StreamTokenizer
in order to tokenize a code text input.
When escape characters appear in a string, the tokenizer unescapes them, while I want to keep the string the same.
For example:
Input: String str = "STRIN\tG";
StreamTokenizer Output: STRIN G
Wanted Output: STRIN\tG
My code:
BufferedReader reader = new BufferedReader(new FileReader("test.java"));
StreamTokenizer tokenizer = new StreamTokenizer(reader);
boolean eof = false;
do {
int type = 0;
type = tokenizer.nextToken();
switch (type) {
case StreamTokenizer.TT_EOF:
eof = true;
break;
case '"':
System.out.println(tokenizer.sval);
break;
}
} while (!eof);
EDIT
I choose to work with StreamTokenizer
because the good handling of comments removing
Upvotes: 4
Views: 563
Reputation: 1607
Add the default
case
and handle the character as you wish to:
switch (type) {
case StreamTokenizer.TT_EOL:
System.out.println("End of Line encountered.");
break;
case StreamTokenizer.TT_WORD:
System.out.print(tokenizer.sval);
break;
case StreamTokenizer.TT_EOF:
eof = true;
break;
case '"':
System.out.println(tokenizer.sval);
break;
default:
System.out.print((char) type);
}
Upvotes: -1
Reputation: 16060
The StreamTokenizer
constructor JavaDoc states:
All byte values '\u0000' through '\u0020' are considered to be white space.
and \t
is sort of \u000a... You can use the whitespaceChars()
method to change this behavior.
A side note: If you choose to println()
a string containing \t
most/all terminals will move the cursor to the next tab position, instead of actually printing \t
...
Cheers,
Upvotes: 1