gudge
gudge

Reputation: 1073

Encoding in Java Prograaming Language

Please see JLS7. Section 3.2 page 16 states

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

I disassembled a hello world program.

class Y {
String hello = "hello";
}

Following is the assembly:

Classfile /c:/Work/SR1/e2/tmp/Y.class
Last modified Jan 5, 2014; size 240 bytes
MD5 checksum 96694fda4f346a62d5412c56dc36c45d
Compiled from "X.java"
class Y
  SourceFile: "X.java"
  minor version: 0
  major version: 52
  flags: ACC_SUPER
  Constant pool:
  #1 = Class              #2             //  Y
  #2 = Utf8               Y
  #3 = Class              #4             //  java/lang/Object
  #4 = Utf8               java/lang/Object
  #5 = Utf8               hello
  #6 = Utf8               Ljava/lang/String;
  #7 = Utf8               <init>
  #8 = Utf8               ()V
  #9 = Utf8               Code
  #10 = Methodref          #3.#11         //  java/lang/Object."<init>":()V
  #11 = NameAndType        #7:#8          //  "<init>":()V
  #12 = String             #5             //  hello
  #13 = Fieldref           #1.#14         //  Y.hello:Ljava/lang/String;
  #14 = NameAndType        #5:#6          //  hello:Ljava/lang/String;
  #15 = Utf8               LineNumberTable
  #16 = Utf8               SourceFile
  #17 = Utf8               X.java
  {
  ...

I see only Utf8 encoding and no Utf16. Why there is no Utf16 encoding.

Thanks

Upvotes: 1

Views: 203

Answers (1)

Stephen C
Stephen C

Reputation: 718788

In an executing program, text is (typically1) represented in UTF-16.

But in a ".class" file, text in the constant pool (i.e. String literals, identifiers, and so on) is encoded in UTF-8 to save space. (Encoding of constant pool entries in UTF-8 is mandated by the JVM spec - Section 4.4 ... and is nothing to do with default character sets.)

When the class file is loaded, the UTF-8 constant pool entries are transcoded to UTF-16 by the classloader.


1 - An application could be written to encode text in a myriad different ways. The UTF-16 encoding we are talking about here is the natural encoding scheme for text data in Java; i.e. the encoding you get when you store text a String or any other subtype of CharacterSequence.

Upvotes: 5

Related Questions