Reputation: 8542
String
class has a constructor:
new String(byte[] bytes, Charset charset)
and a method:
byte[] getBytes(Charset charset)
Given that I define my charset
as follows:
Charset charset = Charset.forName("UTF-8");
What kind of encoding I will in fact use? More specifically is it a standard UTF-8 (as described in RFC 3629), or CESU-8, or Modified UTF-8? (See also corresponding Wikipedia article)
In case if it's not a standard UTF-8 is there a library that allows String operations in utf8?
A converter for these UTF-8-derived encodings is more than welcomed!
Upvotes: 1
Views: 470
Reputation: 7061
The UTF-8 charset is specified by RFC 2279; the transformation format upon which it is based is specified in Amendment 2 of ISO 10646-1 and is also described in the Unicode Standard.
http://download-llnw.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html
Upvotes: 3