Scheintod
Scheintod

Reputation: 8115

Streams collect vs. map collect

There are a few methods in Stream especially when dealing with numerical values which can be written this way or another. (The same question would apply for average())

So which method is preferable:

DoubleSummaryStatistics result;

result = stream()
        .collect( Collectors.summarizingDouble( weighter::weight ) );

vs.

result = stream()
        .mapToDouble( weighter::weight )
        .summaryStatistics();

and why?

(As I see it, the first one has the advantage of "visiting" each element only once, while the second one has cleaner semantics but is visiting each element at least twice. But is this even important/correct?)

Upvotes: 4

Views: 2107

Answers (3)

Lukasz Wiktor
Lukasz Wiktor

Reputation: 20422

Many predefined Collectors might seem redundant since they represent operations that are directly accessible on a Stream. However, they make sense when you start to compose Collectors. For example:

Map<Department, DoubleSummaryStatistics> statsByDept = employees.stream()
    .collect(Collectors.groupingBy(Employee::getDepartment,
                                   Collectors.summarizingDouble(Employee::getSalary)));

Upvotes: 1

assylias
assylias

Reputation: 328608

Performance wise, it seems that the second approach (map then summarize) is faster than the first approach (using the collector):

Benchmark                         (n)  Mode  Samples     Score     Error  Units
c.a.p.SO26775395.collector         10  avgt       10     0.110 ±   0.004  us/op
c.a.p.SO26775395.collector       1000  avgt       10     9.134 ±   0.310  us/op
c.a.p.SO26775395.collector    1000000  avgt       10  9091.649 ± 274.113  us/op
c.a.p.SO26775395.summary           10  avgt       10     0.110 ±   0.003  us/op
c.a.p.SO26775395.summary         1000  avgt       10     5.593 ±   0.234  us/op
c.a.p.SO26775395.summary      1000000  avgt       10  5598.776 ± 153.314  us/op

Benchmark code:

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
public class SO26775395 {

  @Param({"10", "1000", "1000000"}) int n;
  List<Weighter> weights;

  @Setup public void setup() {
    weights = new Random().doubles(n)
            .mapToObj(Weighter::new)
            .collect(toList());
  }

  @Benchmark public DoubleSummaryStatistics collector() {
    return weights.stream().collect(Collectors.summarizingDouble(Weighter::w));
  }

  @Benchmark public DoubleSummaryStatistics summary() {
    return weights.stream().mapToDouble(Weighter::w).summaryStatistics();
  }

  public static class Weighter {
    private final double w;
    public Weighter(double w) { this.w = w; }
    public double w() { return w; }
  }

}

Upvotes: 5

danibuiza
danibuiza

Reputation: 77

summaryStatistics() gives you more information but its performance may not be the desired, depends on what you want to get as output...

Upvotes: -2

Related Questions