Reputation: 1275
Coming from Languages without a GC (C/C++/Rust..) i am wondering what exactly is happening if an array is reallocated.
if we're in a c++ like language(pseudo code), this is considered bad:
Obj *x = xarr[2];
xarr.push(new Obj(12));
do_with(x);
running example in c++ http://ideone.com/qk7vcj
after the push, x may point to freed memory due to reallocation of xarr.
x is basically just a pointer sized integer storing the memory address of xarr[2].
if i do the same in java. this is working just fine and i am wondering why?
List<OBJ> list = new ArrayList<>();
list.add(new OBJ());
list.add(new OBJ());
list.add(new OBJ());
OBJ x = list.get(2);
for (int idx = 0; idx < 1000000; idx++) {
list.add(new OBJ());
}
do_it(x);
what exactly is x and how and why is the memory address of x changed after the array is seemingly reallocated?
obviously java is not deepcopying the array because x2 could not change x like in this code as you can see, the address of x is changing, too.
private static class OBJ {
int one;
String two;
public OBJ() {
this.one = 1;
this.two = "two";
}
}
public static void do_it(OBJ o) {
System.out.println("o.two is: " + o.two);
}
public static void main(String[] args)
{
List<OBJ> list = new ArrayList<>();
list.add(new OBJ());
list.add(new OBJ());
list.add(new OBJ());
OBJ x = list.get(2);
printAddresses("Address x", x);
for (int idx = 0; idx < 1000000; idx++) {
list.add(new OBJ());
}
OBJ x2 = list.get(2);
x2.two = "haha";
printAddresses("Address x", x);
do_it(x);
}
should not print out this
Address x: 0x525554440
Address x: 0x550882b80
o.two is: haha
full working example can be found here http://ideone.com/P3j6xF
so that begs the question how is the address of x changed after the reallocation of the list. And what exactly is the so called "reference"? I thought the so called "reference" in Java is just an ordinary pointer with something like autodereference and no pointer arithmetic because in Java everything is passed by value and not by reference. this is clearly evident in this code http://ideone.com/k4Ijq0
public static void test1(OBJ o) {
o.one = 2;
}
public static void test2(OBJ o) {
o = new OBJ();
o.two = "no reference";
}
public static void main (String[] args) throws java.lang.Exception
{
OBJ x = new OBJ();
test1(x);
test2(x);
System.out.println("x.one: " + x.one + " x.two: " + x.two);
}
printing out
x.one: 2 x.two: two
so it seems like x is behaving like a pointer but somehow java is redirecting it if necessary. How does this work? The term "reference" is extra confusing, why is it called like that?
Upvotes: 2
Views: 332
Reputation: 18569
The reallocation of the list doesn't change the value of x
. In Java, x
will contain a reference to the created object. If the array backing the list is reallocated, then x
is still a reference to the same object.
What you're seeing is the addresses of objects changing because of the garbage collector. You can see the same results where x
is not in the list at all:
public static void main(String[] args) {
List<OBJ> list = new ArrayList<>(10000000);
OBJ x = new OBJ();
printAddresses("Address x", x);
for (int idx = 0; idx < 1000000; idx++) {
list.add(new OBJ());
}
printAddresses("Address x", x);
}
Output:
Address x: 0x710b05580
Address x: 0x54d5a19c0
Objects can be moved around in memory as the garbage collector does its work. When this happens, then any addresses that need to be changed are updated at the same time.
Also, in c++, your value of x
is a reference to an item within the list, so if the list is reallocated this reference becomes invalid. In Java x
is a copy of an item in the list so it doesn't matter if the list is reallocated. It's not possible to have a reference to an element in Java.
List<OBJ>
in Java is really a list of references to objects. These objects exist independently of the list. You can take a copy of one of these references to get a new reference to the same object.
Upvotes: 0
Reputation: 279960
The Java Virtual Machine Specification states
There are three kinds of reference types: class types, array types, and interface types. Their values are references to dynamically created class instances, arrays, or class instances or arrays that implement interfaces, respectively.
Similarly, the Java Language Specification states
The reference values (often just references) are pointers to these objects, and a special null reference, which refers to no object.
In other words, values for reference types are (more or less) the address of a corresponding object. This is obviously abstracted away from you, the Java developer. You never need to know where an object is in memory because you don't manage memory. The JVM does that.
When you do this
OBJ x = new OBJ();
or get the reference value some other way
OBJ x = list.get(2);
The variable x
simply holds that reference value, which points to the actual object (or potentially the null
reference).
Java is a garbage collected language. Modern garbage collection algorithms use generational and copying strategies. That is, they'll move around objects between generations as they decide how long-lived those objects are. That move is a copy and clear. The GC will go through a dedicated area, copy all live objects to another area and mark the original as free memory.
This is obviously problematic for our previously mentioned x
variable. If it was pointing to a live object in memory and that memory was "cleared", we're setting ourselves up for problems. The GC therefore has to go through all the variables (instance variables, local variables, array elements) that stored the location of a moved object and update them before allowing the program to proceed (done during Stop The World collections).
This is what you see with your Unsafe
code.
OBJ x = list.get(2);
printAddresses("Address x", x);
The object referenced by the value stored in x
is in a certain location in memory when you first invoke printAddresses
. After generating a bunch of new objects, triggering the garbage collector, the object is moved to a new location and all references to it are updated (the value in x
, the value in the ArrayList
's underlying array). If you had more memory (or created fewer objects), this would not have occurred (yet).
How does Array reallocation work in Java?
This has nothing to do with the array, really. The ArrayList
object contains an array field (named elementData
which references an array object. For example
elementData = 0x4000
and that object, internally, has references to other objects (array elements are variables).
elementData[0] = 0x6720
elementData[1] = 0x6808
elementData[2] = 0x4393
elementData[3] = 0x7121
elementData[4] = 0x2425
elementData[5] = 0x4867
elementData[6] = 0x976
elementData[7] = 0x1082
elementData[8] = 0x4160
elementData[9] = 0x1850
When you hit that element limit and ArrayList
has to reallocate the array, it simply copies over all those reference values to a new array.
elementData = 0x8900;
elementData[0] = 0x6720 (same as above)
elementData[1] = 0x6808
elementData[2] = 0x4393
elementData[3] = 0x7121
elementData[4] = 0x2425
elementData[5] = 0x4867
elementData[6] = 0x976
elementData[7] = 0x1082
elementData[8] = 0x4160
elementData[9] = 0x1850
elementData[10] = 0x0000 (something for null)
...
elementData[newLength-1] = 0x0000
assuming of course that none of these objects were moved during a garbage collection cycle. If they had, the GC would've updated the array variables as well.
Again, though, as a Java developer, you shouldn't need to care about any of this. It'll very rarely come in handy when writing Java code. You never have access to the actual reference value directly (except when playing with Unsafe
).
Upvotes: 1