Reputation: 142
While toying with the new Java streams, I have noticed something strange related to the performance of parallel streams. I've used a simple program that reads the words from a text file and counts the words having length > 5 (the test file has 30000 words):
String contents = new String(Files.readAllBytes(Paths.get("text.txt")));
List<String> words = Arrays.asList(contents.split("[\\P{L}]+"));
long startTime;
for (int i = 0; i < 100; i++) {
startTime = System.nanoTime();
words.parallelStream().filter(w -> w.length() > 5).count();
System.out.println("Time elapsed [PAR]: " + (System.nanoTime() - startTime));
startTime = System.nanoTime();
words.stream().filter(w -> w.length() > 5).count();
System.out.println("Time elapsed [SEQ]: " + (System.nanoTime() - startTime));
System.out.println("------------------");
}
This generates the following output on my machine (I mention only the first and the last 5 loop iterations):
Time elapsed [PAR]: 114185196
Time elapsed [SEQ]: 3222664
------------------
Time elapsed [PAR]: 569611
Time elapsed [SEQ]: 797113
------------------
Time elapsed [PAR]: 678231
Time elapsed [SEQ]: 414807
------------------
Time elapsed [PAR]: 755633
Time elapsed [SEQ]: 679085
------------------
Time elapsed [PAR]: 755633
Time elapsed [SEQ]: 393425
------------------
...
Time elapsed [PAR]: 90232
Time elapsed [SEQ]: 163785
------------------
Time elapsed [PAR]: 80396
Time elapsed [SEQ]: 154805
------------------
Time elapsed [PAR]: 83817
Time elapsed [SEQ]: 154377
------------------
Time elapsed [PAR]: 81679
Time elapsed [SEQ]: 186449
------------------
Time elapsed [PAR]: 68849
Time elapsed [SEQ]: 154804
------------------
Why is the first processing 100 times slower than the rest? Why is the parallel stream slower than the sequential one in the first iterations but it is twice as fast in the last iterations? Why do both the sequential and parallel streams become faster over time? Is this related to loop optimization?
Later edit: At Luigi's suggestion, I implemented the benchmark using JUnitBenchmarks:
List<String> words = null;
@Before
public void setup() {
try {
String contents = new String(Files.readAllBytes(Paths.get("text.txt")));
words = Arrays.asList(contents.split("[\\P{L}]+"));
} catch (IOException e) {
e.printStackTrace();
}
}
@BenchmarkOptions(benchmarkRounds = 100)
@Test
public void parallelTest() {
words.parallelStream().filter(w -> w.length() > 5).count();
}
@BenchmarkOptions(benchmarkRounds = 100)
@Test
public void sequentialTest() {
words.stream().filter(w -> w.length() > 5).count();
}
I also bumped up the number of words from the test file to 300000. The new results are:
Benchmark.sequentialTest: [measured 100 out of 105 rounds, threads: 1 (sequential)]
round: 0.08 [+- 0.04], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 62, GC.time: 1.53, time.total: 8.65, time.warmup: 0.81, time.bench: 7.85
Benchmark.parallelTest: [measured 100 out of 105 rounds, threads: 1 (sequential)]
round: 0.06 [+- 0.02], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 32, GC.time: 0.79, time.total: 6.82, time.warmup: 0.39, time.bench: 6.43
So it seems that the initial results were caused by a wrong microbenchmark configuration...
Upvotes: 3
Views: 1252
Reputation: 111389
The Hotspot JVM starts executing the program in interpreted mode, and compiles frequently used parts to native code after some analysis. The initial iterations of loops are generally slow because of this.
Upvotes: 3