Juan Vega
Juan Vega

Reputation: 1120

Why is filtering a stream faster than an iterative code that simply returns a sublist?

Could someone explain to me why the following code is faster in the stream version than in the one than only filters by returning a sublist of the main list?

public static final int N = 50000000;

static List<Integer> sourceList = new ArrayList<>();

static {
  for (int i = 0; i < N; i++) {
    sourceList.add(i);
  }
}

@Benchmark
public List<Pair<Integer, Integer>> vanilla() {
   List<Pair<Integer, Integer>> collect1 = 
   sourceList.stream()
     .map(integer -> Pair.of(integer, integer))
     .collect(Collectors.toList());
   return collect1.subList(1000, 100000);
}

@Benchmark
public List<Pair<Integer, Integer>> stream() {
  return sourceList.stream()
    .map(value -> Pair.of(value, value))
    .filter(value -> value.getLeft() > 1000 && value.getLeft() < 100000)
    .collect(Collectors.toList());
}

Benchmark     Mode  Cnt    Score   Error  Units
Test.stream   avgt   20    9.867 ± 0.218  ns/op
Test.vanilla  avgt   20  183.304 ± 8.550  ns/op

I run the test using JMH and I don't understand the results. I thought by adding the mapping function that wraps the Integer value in a Pair would force the stream to create all the new objects to pass them to the filter method to then extract the left part of the Pair for comparison. That sounds more intensive to me than the other approach where there is no filtering and the result is just a sublist of the original one, hence no traversing through the whole list.

Am I missing something here?

Upvotes: 1

Views: 534

Answers (2)

holi-java
holi-java

Reputation: 30676

The performance problem is not the subList's problem, Indeed it wrap the main List simply.

an ArrayList will reset capacity & copy elements into a new array repeatly in need that it can add more elements. vanilla uses large more memory size to adding 50000000 elements. so it is slower than stream since the stream adding [1000..100000] elements only.

the following section is ArrayList class to expand capacity in need, more elements add into an ArrayList result in more ensureCapacityInternal method calls:

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

Maybe the following version vanilla will be faster than stream, but still uses larger memory size:

@Benchmark
public List<Pair<Integer, Integer>> vanilla() {
   List<Pair<Integer, Integer>> main = 
   sourceList.stream()
             .map(integer -> Pair.of(integer, integer))
             .collect(Collectors.toCollection(()->new ArrayList<>(N)));
   return main.subList(1000, 100000);
}

Upvotes: 0

JB Nizet
JB Nizet

Reputation: 691715

Most probably because the first one has to fill a whole list of 50000000 elements, which involves allocating more memory, making several copies of the internal array used by the list every time the capacity is reached.

The second one only has to create a list of 99000 elements, thus allocating less memory, and making less copies of internal arrays.

An even faster solution would be to filter before mapping, and thus avoid useless boxings and creations of pairs. Limiting to 100,000 would of course be faster, too.

Upvotes: 2

Related Questions