Reputation:
Let's say I create an array intended to emulate processor memory:
byte[] mem = new byte[0xF00];
This array is used over the course of the emulation operation, and will eventually (read: with frequency) need to be discarded or reset. My question is, which is faster, and why?
mem = new byte[0xF00];
or:
for(int i = 0; i < mem.length; i++) mem[i] = 0;
It may not seem significant, but when emulating a massive number of processors, small efficiencies make a difference. The difference in speed will come from the JVM's garbage collection; in one, the array must be dumped and garbage collected, however, the JVM no longer has to allocate (and potentially zero out?) new memory. In the second, the JVM cost is avoided, but we still have to iterate over each element in the array.
As additional caveats to this question:
short[]
?Upvotes: 4
Views: 1457
Reputation: 3417
Reallocating the array actually does not increase the per-GC cost, because the GC only visits and copies live objects and does nothing with dead objects. However, allocating objects causes minor GCs to happen more frequent. But, if none of the recently allocated objects are still alive, the cost of a minor GC is very low and major GCs won't be caused at all.
Also, object allocation is cheap in current Java versions and the zeroing of the allocation space can easily be assumed to be the most efficient zeroing the JVM could possibly achieve. If you manage to zero the array in your code as fast as the JVM does (Edit: As mentioned by Steven Schlansker, the JIT compiler may optimize memory-filling loops), reusing the array should be faster. Anyway, until the for-loop you illustrated is optimized by the JIT compiler, I suppose it to be substantially slower.
To answer your other questions:
short[]
or byte[]
. However, your for-loop will only need half the number of iterations to zero the same amount of bytes when using a short[]
in place of a byte[]
(setting a byte or a short to 0 should not make any difference)Upvotes: 4
Reputation: 533530
You can test it for yourself but dropping and recreating the array is about the same.
However, it has two downsides
I prefer reusing the array, not because it's the fastest, but it has by far the least impact on the rest of your application.
for (int size = 16; size <= 16* 1024; size *= 2) {
int count1 = 0, count1b = 0,count2 = 0;
long total1 = 0, total1b = 0, total2 = 0;
for (long i = 0; i < 10000000000L; i += size) {
long start = System.nanoTime();
long[] longs = new long[size];
if (longs[0] + longs[longs.length - 1] != 0)
throw new AssertionError();
long mid = System.nanoTime();
long time1 = mid - start;
Arrays.fill(longs, 1L);
long time2 = System.nanoTime() - mid;
count1b++;
total1b += time1;
if (time1 < 10e3) {// no GC
total1 += time1;
count1++;
}
if (time2 < 10e3) {// no GC
total2 += time2;
count2++;
}
}
System.out.printf("%s KB took on average of %,d ns to allocate, %,d ns to allocate including GCs and %,d ns to fill%n",
size * 8 / 1024.0, total1 / count1, total1b/count1b, total2 / count2);
}
prints
0.125 KB took on average of 35 ns to allocate, 36 ns to allocate including GCs and 19 ns to fill
0.25 KB took on average of 39 ns to allocate, 40 ns to allocate including GCs and 31 ns to fill
0.5 KB took on average of 56 ns to allocate, 58 ns to allocate including GCs and 55 ns to fill
1.0 KB took on average of 75 ns to allocate, 77 ns to allocate including GCs and 117 ns to fill
2.0 KB took on average of 129 ns to allocate, 134 ns to allocate including GCs and 232 ns to fill
4.0 KB took on average of 242 ns to allocate, 248 ns to allocate including GCs and 368 ns to fill
8.0 KB took on average of 479 ns to allocate, 496 ns to allocate including GCs and 644 ns to fill
16.0 KB took on average of 1,018 ns to allocate, 1,055 ns to allocate including GCs and 1,189 ns to fill
32.0 KB took on average of 2,119 ns to allocate, 2,200 ns to allocate including GCs and 2,625 ns to fill
64.0 KB took on average of 4,419 ns to allocate, 4,604 ns to allocate including GCs and 4,728 ns to fill
128.0 KB took on average of 8,333 ns to allocate, 9,472 ns to allocate including GCs and 8,685 ns to fill
Proving only that it is difficult to assume one approach is faster than the other in all cases.
If I change the long[]
to an int[]
I see much the same
0.125 KB took on average of 35 ns to allocate, 36 ns to allocate including GCs and 16 ns to fill
0.25 KB took on average of 40 ns to allocate, 41 ns to allocate including GCs and 24 ns to fill
0.5 KB took on average of 58 ns to allocate, 60 ns to allocate including GCs and 40 ns to fill
1.0 KB took on average of 86 ns to allocate, 87 ns to allocate including GCs and 94 ns to fill
2.0 KB took on average of 139 ns to allocate, 143 ns to allocate including GCs and 149 ns to fill
4.0 KB took on average of 256 ns to allocate, 262 ns to allocate including GCs and 206 ns to fill
8.0 KB took on average of 472 ns to allocate, 481 ns to allocate including GCs and 317 ns to fill
16.0 KB took on average of 981 ns to allocate, 999 ns to allocate including GCs and 516 ns to fill
32.0 KB took on average of 2,098 ns to allocate, 2,146 ns to allocate including GCs and 1,458 ns to fill
64.0 KB took on average of 4,312 ns to allocate, 4,445 ns to allocate including GCs and 4,028 ns to fill
128.0 KB took on average of 8,497 ns to allocate, 9,072 ns to allocate including GCs and 7,141 ns to fill
Upvotes: 5
Reputation: 3749
There are 4 factors that are important here.
1) What is the target platform? (Does it have lots of RAM? Multiple CPU cores?) 2) What is the largest amount of memory you plan on allocating? (Larger amounts will likely favor allocation/deallocation) 3) Which JVM are you planning on using? 4) If your application is that performance critical, why are you developing it in Java?
More to the point I would say, "don't worry about premature optimization". Write the software first, then profile it, then optimize the parts that perform slowly. As a general rule, algorithm performance is generally a bigger issue than data structure performance, particularly when your data structure is basically just a blank addressing space.
Upvotes: 0
Reputation: 7692
I agree to the observation that resuing array would have least impact on application, but your specific case doesn't seem to affect GC much:
for(int i = 0; i < mem.length; i++) mem[i] = 0;
In above loop (mem.length
is 61440), there would be 2*61400
assignments and 61400
comparisons.
Now in case of GC at the time of sweep or memory de-allocation phase of specific object the whole memory chunk would be de-refrenced which IMO should be faster than stats from above loop.
But GC actual cost on application performance comes when code/application behavior causes too many GC cycle (even worst if its frequent major cycles). Your specific case doesn't reveal that higher explicit GC.
I think loop approach in byte[]
would be better. Had it been Object[]
than we might have different approach.
Upvotes: 3
Reputation: 2063
I would definitely go for mem = new byte[0xF00];
and let the GC do the rest.
The memory usage may be a little bit bigger but it wont have an impact on your application unless you do it a thousands of times per second.
The execution time will be much faster and there is no need to manually call the GC, he will do his job anyway.
Upvotes: 1