Reputation: 11
Trying to understand the results of some simple tests. First the code with arrays:
public class TestFillingArrayOfIntegers {
public static void main(String[] args) {
Integer[] intArray = new Integer[20_000_000];
fill(intArray);
fill(intArray);
fill(intArray);
fill(intArray);
fill(intArray);
}
static void fill(Integer[] in) {
long startTime = System.nanoTime();
for (int i = 0; i < 20_000_000; i++) { in[i] = i; }
System.out.println((System.nanoTime() - startTime) / 1_000_000 + " ms");
}
}
It results in these numbers on my machine (Oracle Java 8 on Intel i5 desktop with Windows 10):
4442 ms 6634 ms 1038 ms 7745 ms 1210 ms
Now the code with ArrayList:
public class TestFillingArrayListOfIntegers {
public static void main(String[] args) {
java.util.ArrayList<Integer> intList = new java.util.ArrayList<>();
fill(intList);
intList.clear();
fill(intList);
intList.clear();
fill(intList);
intList.clear();
fill(intList);
intList.clear();
fill(intList);
}
static void fill(java.util.ArrayList<Integer> in) {
long startTime = System.nanoTime();
for (int i = 0; i < 20_000_000; i++) { in.add(i); }
System.out.println((System.nanoTime() - startTime) / 1_000_000 + " ms");
}
}
Which results to
5155 ms 965 ms 7415 ms 93 ms 902 ms
I've tried Float also, with the same results. Numbers are almost the same from run to run.
Filling arrays of primitive types (i.e. int[]) is almost instant, always.
I just can't understand why the numbers gotten from serial method invocations are so different (here i don't ask about the performance issues with absolute values of these numbers). Have just one guess: filling arrays with wrappers uses all CPU cores of my machine, so may be the reason is in unstable multithreading.
And i know about flawed benchmarks and that using JMH is preferable. Just think my primitive tests are valid, please tell me if they are not.
Thanks.
=== [EDIT] added JMH tests ===
# JMH version: 1.19
# VM version: JDK 1.8.0_151, VM 25.151-b12
# VM invoker: C:\Program Files\Java\jre1.8.0_151\bin\java.exe
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.testMethod
First the code for arrays:
package org.sample;
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@State(Scope.Thread)
public class MyBenchmark {
private Integer[] intArray;
@Setup
public void setup() {
intArray = new Integer[20_000_000];
}
@Benchmark
public void testMethod() {
for (int i = 0; i < 20_000_000; i++) { intArray[i] = i; }
}
}
And results:
# Run progress: 0,00% complete, ETA 00:00:30
# Fork: 1 of 3
# Warmup Iteration 1: 1607,614 ms/op
# Warmup Iteration 2: 7922,442 ms/op
# Warmup Iteration 3: 7430,643 ms/op
# Warmup Iteration 4: 1067,362 ms/op
# Warmup Iteration 5: 1257,112 ms/op
Iteration 1: 660,460 ms/op
Iteration 2: 1222,175 ms/op
Iteration 3: 664,795 ms/op
Iteration 4: 453,940 ms/op
Iteration 5: 460,370 ms/op
# Run progress: 33,33% complete, ETA 00:00:52
# Fork: 2 of 3
# Warmup Iteration 1: 1621,263 ms/op
# Warmup Iteration 2: 8021,981 ms/op
# Warmup Iteration 3: 7497,249 ms/op
# Warmup Iteration 4: 1052,803 ms/op
# Warmup Iteration 5: 1225,479 ms/op
Iteration 1: 642,912 ms/op
Iteration 2: 629,243 ms/op
Iteration 3: 644,419 ms/op
Iteration 4: 625,221 ms/op
Iteration 5: 449,515 ms/op
# Run progress: 66,67% complete, ETA 00:00:26
# Fork: 3 of 3
# Warmup Iteration 1: 1616,155 ms/op
# Warmup Iteration 2: 7972,240 ms/op
# Warmup Iteration 3: 7462,278 ms/op
# Warmup Iteration 4: 1039,186 ms/op
# Warmup Iteration 5: 1199,929 ms/op
Iteration 1: 635,411 ms/op
Iteration 2: 620,902 ms/op
Iteration 3: 635,565 ms/op
Iteration 4: 618,084 ms/op
Iteration 5: 443,779 ms/op
Result "org.sample.MyBenchmark.testMethod":
627,119 ?(99.9%) 198,033 ms/op [Average]
(min, avg, max) = (443,779, 627,119, 1222,175), stdev = 185,240
CI (99.9%): [429,087, 825,152] (assumes normal distribution)
# Run complete. Total time: 00:01:19
Benchmark Mode Cnt Score Error Units
MyBenchmark.testMethod avgt 15 627,119 ? 198,033 ms/op
Seems results are not that stable, 1/3 for error is way too much. First run numbers are just awful.
Now the code for ArrayList:
package org.sample;
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@State(Scope.Thread)
public class MyBenchmark {
private java.util.ArrayList<Integer> intList;
@Setup
public void setup() {
intList = new java.util.ArrayList<>();
}
@Benchmark
public void testMethod() {
for (int i = 0; i < 20_000_000; i++) { intList.add(i); }
}
}
Result:
# Run progress: 0,00% complete, ETA 00:00:30
# Fork: 1 of 3
# Warmup Iteration 1: 6058,794 ms/op
# Warmup Iteration 2: 11194,466 ms/op
# Warmup Iteration 3: 1442,472 ms/op
# Warmup Iteration 4: 13665,291 ms/op
# Warmup Iteration 5: 1666,268 ms/op
Iteration 1: 3014,773 ms/op
Iteration 2: 3486,813 ms/op
Iteration 3: 41237,327 ms/op
Iteration 4: 1295,759 ms/op
Iteration 5: 28381,385 ms/op
# Run progress: 33,33% complete, ETA 00:04:17
# Fork: 2 of 3
# Warmup Iteration 1: 5965,381 ms/op
# Warmup Iteration 2: 11372,674 ms/op
# Warmup Iteration 3: 1483,890 ms/op
# Warmup Iteration 4: 13688,102 ms/op
# Warmup Iteration 5: 1699,179 ms/op
Iteration 1: 3055,685 ms/op
Iteration 2: 3433,376 ms/op
Iteration 3: 41953,165 ms/op
Iteration 4: 1316,909 ms/op
Iteration 5: 28855,626 ms/op
# Run progress: 66,67% complete, ETA 00:02:09
# Fork: 3 of 3
# Warmup Iteration 1: 6003,560 ms/op
# Warmup Iteration 2: 11353,880 ms/op
# Warmup Iteration 3: 1443,714 ms/op
# Warmup Iteration 4: 13688,473 ms/op
# Warmup Iteration 5: 2285,464 ms/op
Iteration 1: 3571,613 ms/op
Iteration 2: 4179,211 ms/op
Iteration 3: 41793,050 ms/op
Iteration 4: 1323,737 ms/op
Iteration 5: 28539,350 ms/op
Result "org.sample.MyBenchmark.testMethod":
15695,852 ?(99.9%) 18165,612 ms/op [Average]
(min, avg, max) = (1295,759, 15695,852, 41953,165), stdev = 16992,124
CI (99.9%): [? 0, 33861,464] (assumes normal distribution)
# Run complete. Total time: 00:06:30
Benchmark Mode Cnt Score Error Units
MyBenchmark.testMethod avgt 15 15695,852 ? 18165,612 ms/op
ArrayList results are worse than these without JMH. The error deviation is bigger than result!
Any ideas?
Here are results with some non-default GC settings. I've tested arrays, just changed one string to "@Fork(value = 3, jvmArgsAppend = { "-Xms4096m", "-Xmx4096m", "-verbose:gc"})". Results:
# JMH version: 1.19
# VM version: JDK 1.8.0_151, VM 25.151-b12
# VM invoker: C:\Program Files\Java\jre1.8.0_151\bin\java.exe
# VM options: -Xms4096m -Xmx4096m -verbose:gc
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: org.sample.MyBenchmark.testMethod
# Run progress: 0,00% complete, ETA 00:00:30
# Fork: 1 of 3
# Warmup Iteration 1: [GC (Allocation Failure) 1048576K->471308K(4019712K), 0.5042073 secs]
253,738 ms/op
# Warmup Iteration 2: [GC (Allocation Failure) 1519884K->767524K(4019712K), 0.5134485 secs]
[GC (Allocation Failure) 1816100K->1063700K(4019712K), 0.4944016 secs]
239,485 ms/op
# Warmup Iteration 3: [GC (Allocation Failure) 2112276K->1359908K(4019712K), 0.4932206 secs]
[GC (Allocation Failure) 2408484K->1656100K(4019712K), 0.4927346 secs]
212,527 ms/op
# Warmup Iteration 4: [GC (Allocation Failure) 2704676K->1952292K(3437056K), 0.4950558 secs]
[GC (Allocation Failure) 2418212K->2248136K(3728384K), 0.5216161 secs]
274,812 ms/op
# Warmup Iteration 5: [GC (Allocation Failure) 2714056K->2248120K(3728384K), 1.1181796 secs]
1193,570 ms/op
Iteration 1: [GC (Allocation Failure) 2714040K->2248136K(3728384K), 1.1205924 secs]
635,233 ms/op
Iteration 2: [GC (Allocation Failure) 2714056K->2248208K(3728384K), 1.1172824 secs]
1191,058 ms/op
Iteration 3: [GC (Allocation Failure) 2714128K->2248128K(3728384K), 1.1204173 secs]
634,724 ms/op
Iteration 4: [GC (Allocation Failure) 2714048K->2248168K(3728384K), 1.1276622 secs]
1204,162 ms/op
Iteration 5: [GC (Allocation Failure) 2714088K->2248208K(3728384K), 1.1234440 secs]
637,095 ms/op
# Run progress: 33,33% complete, ETA 00:00:26
# Fork: 2 of 3
# Warmup Iteration 1: [GC (Allocation Failure) 1048576K->471323K(4019712K), 0.5115167 secs]
252,752 ms/op
# Warmup Iteration 2: [GC (Allocation Failure) 1519899K->767531K(4019712K), 0.5080484 secs]
[GC (Allocation Failure) 1816107K->1063715K(4019712K), 0.4911580 secs]
238,002 ms/op
# Warmup Iteration 3: [GC (Allocation Failure) 2112291K->1359891K(4019712K), 0.4941844 secs]
[GC (Allocation Failure) 2408467K->1656083K(4019712K), 0.4919721 secs]
211,512 ms/op
# Warmup Iteration 4: [GC (Allocation Failure) 2704659K->1952275K(3437056K), 0.5040378 secs]
[GC (Allocation Failure) 2418195K->2248131K(3728384K), 0.5245958 secs]
276,411 ms/op
# Warmup Iteration 5: [GC (Allocation Failure) 2714051K->2248107K(3728384K), 1.1093389 secs]
1180,963 ms/op
Iteration 1: [GC (Allocation Failure) 2714027K->2248147K(3728384K), 1.1624603 secs]
652,570 ms/op
Iteration 2: [GC (Allocation Failure) 2714067K->2248155K(3728384K), 1.1054001 secs]
1177,282 ms/op
Iteration 3: [GC (Allocation Failure) 2714075K->2248147K(3728384K), 1.1469809 secs]
645,297 ms/op
Iteration 4: [GC (Allocation Failure) 2714067K->2248155K(3728384K), 1.1026592 secs]
1177,270 ms/op
Iteration 5: [GC (Allocation Failure) 2714075K->2248131K(3728384K), 1.1547916 secs]
651,064 ms/op
# Run progress: 66,67% complete, ETA 00:00:13
# Fork: 3 of 3
# Warmup Iteration 1: [GC (Allocation Failure) 1048576K->471299K(4019712K), 0.5202269 secs]
256,682 ms/op
# Warmup Iteration 2: [GC (Allocation Failure) 1519875K->767531K(4019712K), 0.5250656 secs]
[GC (Allocation Failure) 1816107K->1063731K(4019712K), 0.4992968 secs]
242,321 ms/op
# Warmup Iteration 3: [GC (Allocation Failure) 2112307K->1359939K(4019712K), 0.5023238 secs]
[GC (Allocation Failure) 2408515K->1656131K(4019712K), 0.5024830 secs]
215,224 ms/op
# Warmup Iteration 4: [GC (Allocation Failure) 2704707K->1952323K(3437056K), 0.5032048 secs]
[GC (Allocation Failure) 2418243K->2248127K(3728384K), 0.5347739 secs]
281,705 ms/op
# Warmup Iteration 5: [GC (Allocation Failure) 2714047K->2248167K(3728384K), 1.1136005 secs]
1188,756 ms/op
Iteration 1: [GC (Allocation Failure) 2714087K->2248143K(3728384K), 1.1517189 secs]
648,966 ms/op
Iteration 2: [GC (Allocation Failure) 2714063K->2248151K(3728384K), 1.1092591 secs]
1183,677 ms/op
Iteration 3: [GC (Allocation Failure) 2714071K->2248199K(3728384K), 1.1160382 secs]
631,461 ms/op
Iteration 4: [GC (Allocation Failure) 2714119K->2248207K(3728384K), 1.1566076 secs]
1238,393 ms/op
Iteration 5: [GC (Allocation Failure) 2714127K->2248215K(3728384K), 1.1143401 secs]
631,711 ms/op
Result "org.sample.MyBenchmark.testMethod":
862,664 ?(99.9%) 301,000 ms/op [Average]
(min, avg, max) = (631,461, 862,664, 1238,393), stdev = 281,556
CI (99.9%): [561,664, 1163,665] (assumes normal distribution)
# Run complete. Total time: 00:00:40
Benchmark Mode Cnt Score Error Units
MyBenchmark.testMethod avgt 15 862,664 ? 301,000 ms/op
Can't see any help from this. No idea of all these "GC (Allocation Failure)". I've read about them but what is the reason of pauses? I gave 4GB of heap to JVM.
Upvotes: 0
Views: 155
Reputation: 719386
There are a couple likely sources of "instability" in this example:
One reason that you get different behavior for an list of wrappers versus an array of primitives is that creating each wrapper instance is going to allocate a new object on the heap. In the primitive case, there are no wrapper objects to allocate.
And i know about flawed benchmarks and that using JMH is preferable. Just think my primitive tests are valid, please tell me if they are not.
They are not valid for the reasons (points 1 & 2) above.
Note however, that "eliminating" the GC overheads (point 3 above) from the measurements would distort the results. The GC overheads should be considered as part of the overall costs when comparing arrays and lists.
Use JMH.
This seems to be the nub of your question:
"I still want to know is it OK to make unpredictable GC pauses"
GC pauses are effectively unpredictable, if that is what is what you mean.
If you try hard enough and know enough about what your application is doing, the current heap size, the GC parameters, etc, you can make an educated guess as to when they will happen1. But it is not practical to make such predictions a priori. There are too many variables.
" ... and is it possible to get rid of them tuning GC".
There is no way to entirely eliminate GC pauses by tuning2. You can reduce their length (using a low-pause collector) but not eliminate them. And reduction comes at a cost of the JVM spending more time overall on garbage collection and related overheads.
They are a fact of life.
"Seems it is not possible to work with 20M arrays without GC pauses."
Correct. Unless ... you can design your application so that it doesn't allocate any new objects after it has started up3.
1 - For example, in your benchmark the code is simple enough that the behavior is reproducible.
2 - Even with low-pause collectors such as CMS and G1, a "new space" collection will pause all non-GC threads. The pause is relatively short ... provided that you don't make the "new" space too large. But you cannot eliminate it.
3 - In theory, an application that allocates no objects during normal operation should generate no garbage, and hence have no GC pauses ... after the JVM has warmed up. But to achieve this, you need write your application to avoid using most many / most standard Java SE classes. Very difficult.
Upvotes: 3