Presen
Presen

Reputation: 1857

StreamTokenizer unescape characters

I'm using Java's StreamTokenizer in order to tokenize a code text input.
When escape characters appear in a string, the tokenizer unescapes them, while I want to keep the string the same.

For example:

Input: String str = "STRIN\tG";

StreamTokenizer Output: STRIN    G
Wanted Output: STRIN\tG

My code:

BufferedReader reader = new BufferedReader(new FileReader("test.java"));
StreamTokenizer tokenizer = new StreamTokenizer(reader);

boolean eof = false;
do {
    int type = 0;
    type = tokenizer.nextToken();
    switch (type) {
        case StreamTokenizer.TT_EOF:
                eof = true;
                break;

            case '"':
                System.out.println(tokenizer.sval);
                break;
    }
} while (!eof);

EDIT
I choose to work with StreamTokenizer because the good handling of comments removing

Upvotes: 4

Views: 563

Answers (2)

Nitin Dandriyal
Nitin Dandriyal

Reputation: 1607

Add the default case and handle the character as you wish to:

    switch (type) {
        case StreamTokenizer.TT_EOL:
            System.out.println("End of Line encountered.");
            break;
         case StreamTokenizer.TT_WORD:
            System.out.print(tokenizer.sval);
            break;
        case StreamTokenizer.TT_EOF:
            eof = true;
            break;
        case '"':
            System.out.println(tokenizer.sval);
            break;
        default:
            System.out.print((char) type);
        }

Upvotes: -1

Anders R. Bystrup
Anders R. Bystrup

Reputation: 16060

The StreamTokenizer constructor JavaDoc states:

All byte values '\u0000' through '\u0020' are considered to be white space.

and \t is sort of \u000a... You can use the whitespaceChars() method to change this behavior.

A side note: If you choose to println() a string containing \t most/all terminals will move the cursor to the next tab position, instead of actually printing \t...

Cheers,

Upvotes: 1

Related Questions