Reputation: 11341
I am working on a Android App involves displaying Strings from multiple languages. For example, Chinese might need UTF-8
encoding, whereas Japanese might need ShiftJS
. I was wondering if there's a generic solution for this case to proper display all (or most) languages string?
Thank you!
Upvotes: 1
Views: 654
Reputation: 19021
You have to worry about UTF-8 or Shift_JIS only when you construct a String object from an external source (such as a file) and when you convert a String object to an external form (such as a file). On the contrary, if you already have a String object, you don't have to worry about UTF-8 or Shift_JIS.
When you construct a String object:
// HIRAGANA LETTER A (U+3042), encoded in UTF-8.
byte[] rawDataEncodedInUTF8 = { (byte)0xE3, (byte)0x81, (byte)0x82 };
// Convert to a String object from the bytes.
String a1 = new String(rawDataEncodedInUTF8, "UTF-8");
// HIRAGANA LETTER A (U+3042), encoded in Shift_JIS.
byte[] rawDataEncodedInShiftJIS = { (byte)0x82, (byte)0xA0 };
// Convert to a String object from the bytes.
String a2 = new String(rawDataEncodedInShiftJIS, "Shift_JIS");
// Both a1 and a2 represent HIRAGANA LETTER A (U+3042).
// So, a1.equals(a2) is true.
// String.charAt(int) returns a character at the index in
// UTF-16BE, so c here is 0x3042. Note that the meaning of
// 'U+3042' and that of '0x3042 in UTF-16BE' are different.
char c = a1.charAt(0);
When you build an external form:
String text = ...;
byte[] rawDataEncodedInUTF8 = text.getBytes("UTF-8");
byte[] rawDataEncodedInShiftJIS = text.getBytes("Shift_JIS");
First, you need to understand (1) the difference between Unicode and its encodings (UTF-8/UTF-16BE/UTF-16LE/...) and (2) that Java uses Unicode. Then, I recommend you use UTF-8 when you save data into files, DB and any other external places.
Upvotes: 2