user1131435
user1131435

Reputation:

Is it faster to drop and recreate an array, or fill it with zeroes, and why?

Let's say I create an array intended to emulate processor memory:

byte[] mem = new byte[0xF00];

This array is used over the course of the emulation operation, and will eventually (read: with frequency) need to be discarded or reset. My question is, which is faster, and why?

mem = new byte[0xF00];

or:

for(int i = 0; i < mem.length; i++) mem[i] = 0;

It may not seem significant, but when emulating a massive number of processors, small efficiencies make a difference. The difference in speed will come from the JVM's garbage collection; in one, the array must be dumped and garbage collected, however, the JVM no longer has to allocate (and potentially zero out?) new memory. In the second, the JVM cost is avoided, but we still have to iterate over each element in the array.

As additional caveats to this question:

  1. Does the cost ratio change with the size of the data type? For instance, what about a short[]?
  2. Does the length of the array impact the cost ratio?
  3. Most importantly, why?

Upvotes: 4

Views: 1457

Answers (5)

Michael Schmei&#223;er
Michael Schmei&#223;er

Reputation: 3417

Reallocating the array actually does not increase the per-GC cost, because the GC only visits and copies live objects and does nothing with dead objects. However, allocating objects causes minor GCs to happen more frequent. But, if none of the recently allocated objects are still alive, the cost of a minor GC is very low and major GCs won't be caused at all.

Also, object allocation is cheap in current Java versions and the zeroing of the allocation space can easily be assumed to be the most efficient zeroing the JVM could possibly achieve. If you manage to zero the array in your code as fast as the JVM does (Edit: As mentioned by Steven Schlansker, the JIT compiler may optimize memory-filling loops), reusing the array should be faster. Anyway, until the for-loop you illustrated is optimized by the JIT compiler, I suppose it to be substantially slower.

To answer your other questions:

  • The GC zeros the allocation space (Eden) at once, so there won't be a difference whether its a short[] or byte[]. However, your for-loop will only need half the number of iterations to zero the same amount of bytes when using a short[] in place of a byte[] (setting a byte or a short to 0 should not make any difference)
  • The longer the array gets, the more iterations your for-loop will need. So this growth is linear. The GC will also need amortized linear time to zero the byte range involved, thus I suppose the ratio between both approaches to remain constant. However, there may exist more efficient ways to zero large memory areas than small ones, which would make the zeroing done by the GC (entire allocation space at once) more efficient than the loop approach. For very large arrays, the situation may change: Those are allocated directly into the Tenured generation (unless G1 is used) and will therefore cause major GCs which are much more expensive.

Upvotes: 4

Peter Lawrey
Peter Lawrey

Reputation: 533530

You can test it for yourself but dropping and recreating the array is about the same.

However, it has two downsides

  • it causes your CPU data cache to scroll, reducing it's effectiveness.
  • it makes triggering a GC more likely, esp if you do this often, which pauses the systems, or slows it down (if it is concurrent)

I prefer reusing the array, not because it's the fastest, but it has by far the least impact on the rest of your application.


for (int size = 16; size <= 16* 1024; size *= 2) {
    int count1 = 0, count1b = 0,count2 = 0;
    long total1 = 0, total1b = 0, total2 = 0;
    for (long i = 0; i < 10000000000L; i += size) {
        long start = System.nanoTime();
        long[] longs = new long[size];
        if (longs[0] + longs[longs.length - 1] != 0)
            throw new AssertionError();
        long mid = System.nanoTime();
        long time1 = mid - start;
        Arrays.fill(longs, 1L);
        long time2 = System.nanoTime() - mid;
        count1b++;
        total1b += time1;
        if (time1 < 10e3) {// no GC
            total1 += time1;
            count1++;
        }
        if (time2 < 10e3) {// no GC
            total2 += time2;
            count2++;
        }
    }
    System.out.printf("%s KB took on average of %,d ns to allocate, %,d ns to allocate including GCs and %,d ns to fill%n",
            size * 8 / 1024.0, total1 / count1, total1b/count1b, total2 / count2);
}

prints

0.125 KB took on average of 35 ns to allocate, 36 ns to allocate including GCs and 19 ns to fill
0.25 KB took on average of 39 ns to allocate, 40 ns to allocate including GCs and 31 ns to fill
0.5 KB took on average of 56 ns to allocate, 58 ns to allocate including GCs and 55 ns to fill
1.0 KB took on average of 75 ns to allocate, 77 ns to allocate including GCs and 117 ns to fill
2.0 KB took on average of 129 ns to allocate, 134 ns to allocate including GCs and 232 ns to fill
4.0 KB took on average of 242 ns to allocate, 248 ns to allocate including GCs and 368 ns to fill
8.0 KB took on average of 479 ns to allocate, 496 ns to allocate including GCs and 644 ns to fill
16.0 KB took on average of 1,018 ns to allocate, 1,055 ns to allocate including GCs and 1,189 ns to fill
32.0 KB took on average of 2,119 ns to allocate, 2,200 ns to allocate including GCs and 2,625 ns to fill
64.0 KB took on average of 4,419 ns to allocate, 4,604 ns to allocate including GCs and 4,728 ns to fill
128.0 KB took on average of 8,333 ns to allocate, 9,472 ns to allocate including GCs and 8,685 ns to fill

Proving only that it is difficult to assume one approach is faster than the other in all cases.

If I change the long[] to an int[] I see much the same

0.125 KB took on average of 35 ns to allocate, 36 ns to allocate including GCs and 16 ns to fill
0.25 KB took on average of 40 ns to allocate, 41 ns to allocate including GCs and 24 ns to fill
0.5 KB took on average of 58 ns to allocate, 60 ns to allocate including GCs and 40 ns to fill
1.0 KB took on average of 86 ns to allocate, 87 ns to allocate including GCs and 94 ns to fill
2.0 KB took on average of 139 ns to allocate, 143 ns to allocate including GCs and 149 ns to fill
4.0 KB took on average of 256 ns to allocate, 262 ns to allocate including GCs and 206 ns to fill
8.0 KB took on average of 472 ns to allocate, 481 ns to allocate including GCs and 317 ns to fill
16.0 KB took on average of 981 ns to allocate, 999 ns to allocate including GCs and 516 ns to fill
32.0 KB took on average of 2,098 ns to allocate, 2,146 ns to allocate including GCs and 1,458 ns to fill
64.0 KB took on average of 4,312 ns to allocate, 4,445 ns to allocate including GCs and 4,028 ns to fill
128.0 KB took on average of 8,497 ns to allocate, 9,072 ns to allocate including GCs and 7,141 ns to fill

Upvotes: 5

Mikkel L&#248;kke
Mikkel L&#248;kke

Reputation: 3749

There are 4 factors that are important here.

1) What is the target platform? (Does it have lots of RAM? Multiple CPU cores?) 2) What is the largest amount of memory you plan on allocating? (Larger amounts will likely favor allocation/deallocation) 3) Which JVM are you planning on using? 4) If your application is that performance critical, why are you developing it in Java?

More to the point I would say, "don't worry about premature optimization". Write the software first, then profile it, then optimize the parts that perform slowly. As a general rule, algorithm performance is generally a bigger issue than data structure performance, particularly when your data structure is basically just a blank addressing space.

Upvotes: 0

harsh
harsh

Reputation: 7692

I agree to the observation that resuing array would have least impact on application, but your specific case doesn't seem to affect GC much:

for(int i = 0; i < mem.length; i++) mem[i] = 0;

In above loop (mem.lengthis 61440), there would be 2*61400 assignments and 61400 comparisons.

Now in case of GC at the time of sweep or memory de-allocation phase of specific object the whole memory chunk would be de-refrenced which IMO should be faster than stats from above loop.

But GC actual cost on application performance comes when code/application behavior causes too many GC cycle (even worst if its frequent major cycles). Your specific case doesn't reveal that higher explicit GC.

I think loop approach in byte[] would be better. Had it been Object[] than we might have different approach.

Upvotes: 3

Padrus
Padrus

Reputation: 2063

I would definitely go for mem = new byte[0xF00]; and let the GC do the rest.

The memory usage may be a little bit bigger but it wont have an impact on your application unless you do it a thousands of times per second.

The execution time will be much faster and there is no need to manually call the GC, he will do his job anyway.

Upvotes: 1

Related Questions