why a Chinese character takes one char (2 bytes) but 3 bytes?

Question

I have the following program to test how Java handle Chinese characters:

String s3 = "世界您好";
char[] chs = s3.toCharArray();
byte[] bs = s3.getBytes(StandardCharsets.UTF_8);
byte[] bs2 = new String(chs).getBytes(StandardCharsets.UTF_8);

System.out.println("encoding=" + Charset.defaultCharset().name() + ", " + s3 + " char[].length=" + chs.length
                + ", byte[].length=" + bs.length + ", byte[]2.length=" + bs2.length);

The print out is this:

encoding=UTF-8, 世界您好 char[].length=4, byte[].length=12, byte[]2.length=12

The result are these:

one Chinese character takes one char, which is 2 bytes in Java, if char[] is used to hold the Chinese characters;
one Chinese character takes 3 bytes if byte[] is used to hold the Chinese characters;

My questions are if 2 bytes are enough, why we use 3 bytes? if 2 bytes is not enough, why we use 2 bytes?

EDIT:

My JVM's default encoding is set to UTF-8.

why a Chinese character takes one char (2 bytes) but 3 bytes?

Answers (1)

Related Questions