Allan Jiang
Allan Jiang

Reputation: 11341

Android - Proper String Encoding for All Languages

I am working on a Android App involves displaying Strings from multiple languages. For example, Chinese might need UTF-8 encoding, whereas Japanese might need ShiftJS. I was wondering if there's a generic solution for this case to proper display all (or most) languages string?

Thank you!

Upvotes: 1

Views: 654

Answers (1)

Takahiko Kawasaki
Takahiko Kawasaki

Reputation: 19021

You have to worry about UTF-8 or Shift_JIS only when you construct a String object from an external source (such as a file) and when you convert a String object to an external form (such as a file). On the contrary, if you already have a String object, you don't have to worry about UTF-8 or Shift_JIS.

When you construct a String object:

// HIRAGANA LETTER A (U+3042), encoded in UTF-8.
byte[] rawDataEncodedInUTF8 = { (byte)0xE3, (byte)0x81, (byte)0x82 };
// Convert to a String object from the bytes.
String a1 = new String(rawDataEncodedInUTF8, "UTF-8");

// HIRAGANA LETTER A (U+3042), encoded in Shift_JIS.
byte[] rawDataEncodedInShiftJIS = { (byte)0x82, (byte)0xA0 };
// Convert to a String object from the bytes.
String a2 = new String(rawDataEncodedInShiftJIS, "Shift_JIS");

// Both a1 and a2 represent HIRAGANA LETTER A (U+3042).
// So, a1.equals(a2) is true.

// String.charAt(int) returns a character at the index in
// UTF-16BE, so c here is 0x3042. Note that the meaning of
// 'U+3042' and that of '0x3042 in UTF-16BE' are different.
char c = a1.charAt(0);

When you build an external form:

String text = ...;

byte[] rawDataEncodedInUTF8     = text.getBytes("UTF-8");
byte[] rawDataEncodedInShiftJIS = text.getBytes("Shift_JIS");

First, you need to understand (1) the difference between Unicode and its encodings (UTF-8/UTF-16BE/UTF-16LE/...) and (2) that Java uses Unicode. Then, I recommend you use UTF-8 when you save data into files, DB and any other external places.

Upvotes: 2

Related Questions