StreamTokenizer unescape characters

Question

I'm using Java's StreamTokenizer in order to tokenize a code text input.
When escape characters appear in a string, the tokenizer unescapes them, while I want to keep the string the same.

For example:

Input: String str = "STRIN	G";

StreamTokenizer Output: STRIN    G
Wanted Output: STRIN	G

My code:

BufferedReader reader = new BufferedReader(new FileReader("test.java"));
StreamTokenizer tokenizer = new StreamTokenizer(reader);

boolean eof = false;
do {
    int type = 0;
    type = tokenizer.nextToken();
    switch (type) {
        case StreamTokenizer.TT_EOF:
                eof = true;
                break;

            case '"':
                System.out.println(tokenizer.sval);
                break;
    }
} while (!eof);

EDIT
I choose to work with StreamTokenizer because the good handling of comments removing

Anders R. Bystrup · Accepted Answer

The StreamTokenizer constructor JavaDoc states:

All byte values '\u0000' through '\u0020' are considered to be white space.

and is sort of \u000a... You can use the whitespaceChars() method to change this behavior.

A side note: If you choose to println() a string containing most/all terminals will move the cursor to the next tab position, instead of actually printing ...

Cheers,

StreamTokenizer unescape characters

Answers (2)

Related Questions