Piyush
Piyush

Reputation: 638

If all Java Strings are UTF-16 strings then how can char datatype have max size of 2?

If Strings in Java are UTF-16 then UTF-16 character may have size of 4 bytes. So 1 UTF-16 character will have to map to 2 chars.

and this would mean that String length may be less than equivalent char[] length.

But that is not the case.

Character x = new Character((char) 7000);
String s = new String(""+x+x+x);

byte [] ar =  s.getBytes();
char [] arr =  s.toCharArray();

byte array has length 9.
char array has length 3.
so how can char have size of 2 bytes ?

So I think char in java may be larger than 2 bytes depending on the need is that correct .

If so what is the max size of char in java ? Or it is variable length and may go upto infinity in future ?

Upvotes: 0

Views: 1751

Answers (1)

chiastic-security
chiastic-security

Reputation: 20520

The String.getBytes() call doesn't return the UTF-16 internal representation. It returns the string in the platform's default encoding. In your case, that is quite likely to be UTF-8 (though, being a platform-determined thing, you'd need to check to be sure). The UTF-8 encoded form of (char)7000 (Unicode codepoint U+1B58 BALINESE DIGIT EIGHT) is 3 bytes - E1 AD 98. Hence your 9 bytes for 3 chars.

Upvotes: 5

Related Questions