john ding
john ding

Reputation: 397

Does String.intern has anything to to with JVM Run-Time Constant pool

According to the jvms11, there are two kinds of entry in the run-time constant pool: symbolic references, which may later be resolved, and static constants, which require no further processing. The static constants in the run-time constant pool are also derived from entries in the constant_pool table in accordance with the structure of each entry.

My understanding is String.intern has nothing to to with JVM Run-Time Constant pool, but someone who is one of the greatest programmer in China said String.intern would "put something in the Run-Time Constant pool during runtime" in his popular book. We argued about this for a long time,we both can't convince each other.

So my question is, does String.intern has anything to to with JVM Run-Time Constant pool?

Here is the jvms5.1 about Rum-Time Constant pool

Upvotes: 0

Views: 249

Answers (1)

Holger
Holger

Reputation: 298183

The JVMS §5.1 says

The Java Virtual Machine maintains a run-time constant pool for each class and interface (§2.5.5).

The term “for each” means that there is not one run-time constant pool, but each class or interface has its own dedicated pool. The subsequent sentences make even clear that this data structure corresponds to the constant pool of a class file.

The constant_pool table in the binary representation of a class or interface (§4.4) is used to construct the run-time constant pool upon class or interface creation (§5.3).

It should be clear that this per-class structure is not the same as the single global data structure used to canonicalize string instances throughout the entire runtime.

But since all uses of string constants in a class file are expressed as indices into the class file’s constant pool which is then used to construct the class’s run-time constant pool, there is a relationship between these uses and the global data structure. As stated within §5.1

The static constants in the run-time constant pool are also derived from entries in the constant_pool table in accordance with the structure of each entry:

  • A string constant is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3). To derive a string constant, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure:
    • If the method String.intern has previously been invoked on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the string constant is a reference to that same instance of class String.
    • Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure. The string constant is a reference to the new instance. Finally, the method String.intern is invoked on the new instance.

So formally, the String instances corresponding to the string constants used in a class are part of that class’s runtime constant pool but initialized in terms of String.intern to ensure that each class has canonicalized string instances in its pool.

But this relationship has only one direction. When application code invokes String.intern() explicitly, it won’t access a class’s run-time constant pool. It wouldn’t even be clear, which run-time constant pool we shall expect to be accessed.

So intern() has nothing do to with the run-time constant pool, at least not more than it has to do with every other caller.

A source of confusion is the fact that the data structure used by the JVM to implement intern() has no name in the JVMS or JLS at all. So, without a formal name, different names appear in different media. E.g., the API documentation of intern() says

A pool of strings, initially empty, is maintained privately by the class String.

It’s typically some kind of hash table, but the term “pool” matches its purpose and since it exists at runtime, it’s not surprising that people come up with terms easy to confuse with the run-time constant pools of JVMS §5.1.

So before starting a heated discussion with another developer, it’s important to clarify, whether everyone is talking about the same pool.


As an addendum, I said above that the String is formally part of the run-time constant pool, as this hits implementation specific aspects.

In principle, a JVM could initialize all string entries of a class’s run-time constant pool with String instances when creating the pool. But as this answer demonstrates, this is not the case for the widely used HotSpot JVM which looks up or creates a String instance on the first use rather than class initialization time.

This implies that the run-time constant pool contains the raw character data in some form instead of a reference to a String instance. Once the String is constructed, it is referenced and reused by the code, but whether the run-time constant pool is modified to reference the String now or the bytecode instruction (ldc) keeps it, is not observable from the Java application. All we can observe, is that the String instance does not get garbage collected as long as the class exist.

Upvotes: 3

Related Questions