Reputation: 1120
Could someone explain to me why the following code is faster in the stream version than in the one than only filters by returning a sublist of the main list?
public static final int N = 50000000;
static List<Integer> sourceList = new ArrayList<>();
static {
for (int i = 0; i < N; i++) {
sourceList.add(i);
}
}
@Benchmark
public List<Pair<Integer, Integer>> vanilla() {
List<Pair<Integer, Integer>> collect1 =
sourceList.stream()
.map(integer -> Pair.of(integer, integer))
.collect(Collectors.toList());
return collect1.subList(1000, 100000);
}
@Benchmark
public List<Pair<Integer, Integer>> stream() {
return sourceList.stream()
.map(value -> Pair.of(value, value))
.filter(value -> value.getLeft() > 1000 && value.getLeft() < 100000)
.collect(Collectors.toList());
}
Benchmark Mode Cnt Score Error Units
Test.stream avgt 20 9.867 ± 0.218 ns/op
Test.vanilla avgt 20 183.304 ± 8.550 ns/op
I run the test using JMH and I don't understand the results. I thought by adding the mapping function that wraps the Integer value in a Pair would force the stream to create all the new objects to pass them to the filter method to then extract the left part of the Pair for comparison. That sounds more intensive to me than the other approach where there is no filtering and the result is just a sublist of the original one, hence no traversing through the whole list.
Am I missing something here?
Upvotes: 1
Views: 534
Reputation: 30676
The performance problem is not the subList
's problem, Indeed it wrap the main List
simply.
an ArrayList
will reset capacity & copy elements into a new array repeatly in need that it can add more elements. vanilla
uses large more memory size to adding 50000000
elements. so it is slower than stream
since the stream
adding [1000..100000]
elements only.
the following section is ArrayList
class to expand capacity in need, more elements add into an ArrayList
result in more ensureCapacityInternal
method calls:
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
Maybe the following version vanilla
will be faster than stream
, but still uses larger memory size:
@Benchmark
public List<Pair<Integer, Integer>> vanilla() {
List<Pair<Integer, Integer>> main =
sourceList.stream()
.map(integer -> Pair.of(integer, integer))
.collect(Collectors.toCollection(()->new ArrayList<>(N)));
return main.subList(1000, 100000);
}
Upvotes: 0
Reputation: 691715
Most probably because the first one has to fill a whole list of 50000000 elements, which involves allocating more memory, making several copies of the internal array used by the list every time the capacity is reached.
The second one only has to create a list of 99000 elements, thus allocating less memory, and making less copies of internal arrays.
An even faster solution would be to filter before mapping, and thus avoid useless boxings and creations of pairs. Limiting to 100,000 would of course be faster, too.
Upvotes: 2