Reputation: 1073
Please see JLS7. Section 3.2 page 16 states
The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.
I disassembled a hello world program.
class Y {
String hello = "hello";
}
Following is the assembly:
Classfile /c:/Work/SR1/e2/tmp/Y.class
Last modified Jan 5, 2014; size 240 bytes
MD5 checksum 96694fda4f346a62d5412c56dc36c45d
Compiled from "X.java"
class Y
SourceFile: "X.java"
minor version: 0
major version: 52
flags: ACC_SUPER
Constant pool:
#1 = Class #2 // Y
#2 = Utf8 Y
#3 = Class #4 // java/lang/Object
#4 = Utf8 java/lang/Object
#5 = Utf8 hello
#6 = Utf8 Ljava/lang/String;
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Methodref #3.#11 // java/lang/Object."<init>":()V
#11 = NameAndType #7:#8 // "<init>":()V
#12 = String #5 // hello
#13 = Fieldref #1.#14 // Y.hello:Ljava/lang/String;
#14 = NameAndType #5:#6 // hello:Ljava/lang/String;
#15 = Utf8 LineNumberTable
#16 = Utf8 SourceFile
#17 = Utf8 X.java
{
...
I see only Utf8 encoding and no Utf16. Why there is no Utf16 encoding.
Thanks
Upvotes: 1
Views: 203
Reputation: 718788
In an executing program, text is (typically1) represented in UTF-16.
But in a ".class" file, text in the constant pool (i.e. String literals, identifiers, and so on) is encoded in UTF-8 to save space. (Encoding of constant pool entries in UTF-8 is mandated by the JVM spec - Section 4.4 ... and is nothing to do with default character sets.)
When the class file is loaded, the UTF-8 constant pool entries are transcoded to UTF-16 by the classloader.
1 - An application could be written to encode text in a myriad different ways. The UTF-16 encoding we are talking about here is the natural encoding scheme for text data in Java; i.e. the encoding you get when you store text a String
or any other subtype of CharacterSequence
.
Upvotes: 5