What kind of utf8 encoding is being used in members of String class in Java?

String class has a constructor:

 new String(byte[] bytes, Charset charset)

and a method:

 byte[] getBytes(Charset charset)

Given that I define my charset as follows:

 Charset charset = Charset.forName("UTF-8");

What kind of encoding I will in fact use? More specifically is it a standard UTF-8 (as described in RFC 3629), or CESU-8, or Modified UTF-8? (See also corresponding Wikipedia article)

In case if it's not a standard UTF-8 is there a library that allows String operations in utf8?

A converter for these UTF-8-derived encodings is more than welcomed!

Upvotes: 1

Views: 470

Answers (1)

Gunslinger47
Gunslinger47

Reputation: 7061

The UTF-8 charset is specified by RFC 2279; the transformation format upon which it is based is specified in Amendment 2 of ISO 10646-1 and is also described in the Unicode Standard.

http://download-llnw.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html

Upvotes: 3

Related Questions