Reputation: 177
So I have a char[] array that contains text and other data.
How can I extract chinese text from the char[] array? Right now I can get english fine with
public String getString(int index, int length) {
String str = "";
for (int i = 0; i < length && this.data[index + i] != 0; i++)
str = str + this.data[index + i];
return str;
}
then I'm trying this:
try {
String charset = "GB18030";
String str = new String(m.target.getBytes("UTF-16"), "GB18030");
System.out.println(str);
System.out.println(str.equals("大家"));
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
m.target is a string i've gotten from the byte[] array with getString() above. I've tried various encodings and combinations of them and none of them will display the text correctly (大家) and none will return true for str.equals("大家")
EDIT
Using this method i can successfully get the chinese characters.
public String test(int index, int length) {
byte[] t = new byte[this.data.length];
for (int i = 0; i < this.data.length; i++)
t[i] = (byte) this.data[i];
try {
return new String(t, index, length, "GB18030");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
But my question now is.. I thought the max a byte could be was 127? How can the byte array hold the high byte chinese characters? Can I safely change the buffer to byte[] instead of char[]?
Upvotes: 4
Views: 14764
Reputation: 22847
Both char and String in Java are Unicode. You don't have to care about this as long as you operate on it inside Java code. You specify encoding while converting to/from byte[] array or read/write to/from IO stream.
To declare String containing chinese characters you can use escaped sequences or just write them in code, but you must care then about file encoding. UTF-8 format is quasi-standard nowadays, it is supported by both IDE's (such as Eclipse) and build tools (maven, ant).
So you just write
char ch = '大';
char[] chrs = new char[]{'大','家'};
String str = "大家";
To read chinese characters from for example UTF-16 encoded file, you use InputStreamReader specifying proper encoding, and you can read then strings, f.e. with help of BufferedReader
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream("myfile.txt"), "UTF-16"));
Upvotes: 4