Reputation: 40
I learned about the Java String Pool recently, and there's a few things that I don't quiet understand.
When using the assignment operator, a new String will be created in the String Pool if it doesn't exist there already.
String a = "foo"; // Creates a new string in the String Pool
String b = "foo"; // Refers to the already existing string in the String Pool
When using the String constructor, I understand that regardless of the String Pool's state, a new string will be created in the heap, outside of the String Pool.
String c = new String("foo"); // Creates a new string in the heap
I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.
String d = new String("bar"); // Creates a new string in the String Pool and in the heap
I didn't find any further information about this, but I would like to know if that's true.
If that is indeed true, then - why? Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.
Another thing that I would like to know is how the .intern() function of the String class works: Does it just return a pointer to the string in the String Pool?
And finally, in the following code:
String s = new String("Hello");
s = s.intern();
Will the garbage collector delete the string that is outside the String Pool from the heap?
Upvotes: 1
Views: 1139
Reputation: 718778
Q: I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap. [] I didn't find any further information about this, but I would like to know if that's true.
It is NOT true. A string created with new
is not placed in the string pool ... unless something explicitly calls intern()
on it.
Q: Why does java create this duplicate string?
Because the JLS specifies that every new
generates a new object. It would be counter-intuitive if it didn't (IMO).
The fact that it is nearly always a bad idea to use new String(String)
is not a good reason to make new
behave differently in this case. The real answer is that programmers should learn not to write that ... except in the extremely rare cases that that it is necessary to do that.
Q: Another thing that I would like to know is how the
intern()
function of the String class works: Does it just return a pointer to the string in the String Pool?
The intern
method always returns a pointer to a string in the string pool. That string may or may not be the string you called intern()
or.
There have been different ways that the string pool was implemented.
In the original scheme, interned strings were held in a special heap call the PermGen heap. In that scheme, if the string you were interning was not already in the pool, then a new string would be allocated in PermGen space, and the intern
method would return that.
In the current scheme, interned strings are held in the normal heap, and the string pool is just a (private) data structure. When the string being interned a not in the pool, it is simply linked into the data structure. A new string does not need to be allocated.
Q: Will the garbage collector delete the string that is outside the String Pool from the heap?
The rule is the same for all Java objects, no matter how they were created, and irrespective of where (in which "space" or "heap" in the JVM) they reside.
If an object is not reachable from the running application, then it is eligible for deletion by the garbage collector.
That doesn't mean that an unreachable object will be be garbage collected in any particular run of the GC. (Or indeed ever ... in some circumstances.)
The above rule equally applies to the String
objects that correspond to string literals. If it ever becomes possible that a literal can never be used again, then it may be garbage collected.
That doesn't normally happen. The JVM keeps a hidden references to each string literal object in a private data structure associated with the class that defined it. Since classes normally exists for the lifetime of the JVM, their string literal objects remain reachable. (Which makes sense ... since the application may need to use them.)
However, if a class is loaded using a dynamically created classloader, and that classloader becomes unreachable, then so will all of its classes. So it is actually possible for a string literal object to become unreachable. If it does, it may be garbage collected.
Upvotes: 2
Reputation: 298143
You wrote
String c = new String("foo"); // Creates a new string in the heap
I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.
That’s somewhat correct, but you have to read the code correctly. Your code contains two String
instances. First, you have the string literal "foo"
that evaluates to a String
instance, the one that will be inserted into the pool. Then, you are creating a new String
instance explicitly, using new String(…)
calling the String(String)
constructor. Since the explicitly created object can’t have the same identity as an object that existed prior to its creation, two String
instances must exist.
Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.
Well it does so, because you told it so. In theory, this construction could get optimized, skipping the intermediate step that you can’t perceive anyway. But the first assumption for a program’s behavior should be that it does precisely what you have written.
You could ask why there’s a constructor that allows such a pointless operation. In fact, this has been asked before and this answer addresses this. In short, it’s mostly a historical design mistake, but this constructor has been used in practice for other technical reasons; some do not apply anymore. Still, it can’t be removed without breaking compatibility.
String s = new String("Hello"); s = s.intern();
Will the garbage collector delete the string that is outside the String Pool from the heap?
Since the intern()
call will evaluate to the instance that had been created for "Hello"
and is distinct from the instance created via new String(…)
, the latter will definitely be unreachable after the second assignment to s
. Of course, this doesn’t say whether the garbage collector will reclaim the string’s memory only that it is allowed to do so. But keep in mind that the majority of the heap occupation will be the array that holds the character data, which will be shared between the two string instances (unless you use a very outdated JVM). This array will still be in use as long as either of the two strings is in use. Recent JVMs even have the String Deduplication feature that may cause other strings of the same contents in the JVM use this array (to allow collection of their formerly used array). So the lifetime of the array is entirely unpredictable.
Upvotes: 6