bouncingHippo
bouncingHippo

Reputation: 6040

StringBuffer class and Chinese character encoding

I have written a method to return a string containing Chinese characters.

public printChineseMenu(){
   StringBuffer buffer;
   buffer.append(chinese string returned from DB);     //chinese characters appear in SQL
   System.out.println(buffer);                         //they appear as question marks
   PrintStream out = new PrintStream(System.out, true, "UTF-8");
   out.println(buffer);                                //chinese characters appear

   return (buffer.toString())
}

Is there a better type to store/return a Chinese character string than StringBuffer class

Upvotes: 3

Views: 6345

Answers (2)

Peter Lawrey
Peter Lawrey

Reputation: 533880

Your best option is to return a String. This is because a String is immutable and can store more information than a single character.

When you print text you need to ensure you write data using the same encoding as what ever is trying to read it expects. For example, if you redirect the output to a file and your reader expect UTF-8 encoding, that is how you would write it.

The problem with System.out used alone is that it doesn't write chars but instead it writes byte and assumes an encoding which might not be what you need.

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1503869

The problem here isn't StringBuffer - it's simply the encoding used by System.out. You'd find the exact same behaviour when printing the string directly, without using a StringBuffer.

StringBuffer (and its more modern, non-thread-safe equivalent, StringBuilder, which you should use instead) don't care about encoding themselves - they just use sequences of UTF-16 code units. They will correctly preserve all Unicode data. The same is true for String.

Your method should almost certainly just return a String - but if you don't need to do any "building" with the string (appending other pieces) then there's no point in using either StringBuffer or StringBuilder. If you do need to build up the reslut string from multiple strings, you should be fine to use either of them, and just return the result of toString() as you are already doing (although the brackets around the return value are irrelevant; return isn't a method).

Consoles can often be misleading when it comes to string data. When in doubt, print out the sequence of UTF-16 code units one at a time, and then work out what that means. That way there's no danger of encodings or unprintable characters becoming an issue.

Upvotes: 4

Related Questions