MWB
MWB

Reputation: 1879

How to get multiple values from an object using a single stream operation?

I want to determine the minimum area required to display a collection of points. The easy way is to loop through the collection like this:

int minX = Integer.MAX_VALUE;
int maxX = Integer.MIN_VALUE;
int minY = Integer.MAX_VALUE;
int maxY = Integer.MIN_VALUE;
for (Point point: points) {
    if (point.x < minX) {
        minX = point.x;
    }
    if (point.x > maxX) {
        maxX = point.x;
    }
    if (point.y < minY) {
        minY = point.y;
    }
    if (point.y > maxY) {
        maxY = point.y;
    }
}

I am getting to know streams. To do the same, you can do the following:

int minX = points.stream().mapToInt(point -> point.x).min().orElse(-1);
int maxX = points.stream().mapToInt(point -> point.x).max().orElse(-1);
int minY = points.stream().mapToInt(point -> point.y).min().orElse(-1);
int maxY = points.stream().mapToInt(point -> point.y).max().orElse(-1);

Both give the same result. However, although the streams approach is elegant, it is much slower (as expected).

Is there a way to get minX, maxX, minY and maxY in a single stream operation?

Upvotes: 10

Views: 3426

Answers (5)

fps
fps

Reputation: 34460

JDK 12 and above has Collectors.teeing (webrev and CSR), which collects to two different collectors and then merges both partial results into a final result.

You could use it here to collect to two IntSummaryStatistics for both the x coordinate and the y coordinate:

List<IntSummaryStatistics> stats = points.stream()
    .collect(Collectors.teeing(
             Collectors.mapping(p -> p.x, Collectors.summarizingInt()),
             Collectors.mapping(p -> p.y, Collectors.summarizingInt()),
             List::of));

int minX = stats.get(0).getMin();
int maxX = stats.get(0).getMax();
int minY = stats.get(1).getMin();
int maxY = stats.get(1).getMax();

Here the first collector gathers statistics for x and the second one for y. Then, statistics for both x and y are merged into a List by means of the JDK 9 List.of factory method that accepts two elements.

An aternative to List::of for the merger would be:

(xStats, yStats) -> Arrays.asList(xStats, yStats)

If you happen to not have JDK 12 installed on your machine, here's a simplified, generic version of the teeing method that you can safely use as a utility method:

public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
        Collector<? super T, A1, R1> downstream1,
        Collector<? super T, A2, R2> downstream2,
        BiFunction<? super R1, ? super R2, R> merger) {

    class Acc {
        A1 acc1 = downstream1.supplier().get();
        A2 acc2 = downstream2.supplier().get();

        void accumulate(T t) {
            downstream1.accumulator().accept(acc1, t);
            downstream2.accumulator().accept(acc2, t);
        }

        Acc combine(Acc other) {
            acc1 = downstream1.combiner().apply(acc1, other.acc1);
            acc2 = downstream2.combiner().apply(acc2, other.acc2);
            return this;
        }

        R applyMerger() {
            R1 r1 = downstream1.finisher().apply(acc1);
            R2 r2 = downstream2.finisher().apply(acc2);
            return merger.apply(r1, r2);
        }
    }

    return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
}

Please note that I'm not considering the characteristics of the downstream collectors when creating the returned collector.

Upvotes: 12

Andrew
Andrew

Reputation: 49606

By analogy with IntSummaryStatistics, create a class PointStatistics which collects the information you need. It defines two methods: one for recording values from a Point, one for combining two Statistics.

class PointStatistics {
    private int minX = Integer.MAX_VALUE;
    private int maxX = Integer.MIN_VALUE;

    private int minY = Integer.MAX_VALUE;
    private int maxY = Integer.MIN_VALUE;

    public void accept(Point p) {
        minX = Math.min(minX, p.x);
        maxX = Math.max(maxX, p.x);

        minY = Math.min(minY, p.y);
        maxY = Math.max(minY, p.y);
    }

    public void combine(PointStatistics o) {
        minX = Math.min(minX, o.minX);
        maxX = Math.max(maxX, o.maxX);

        minY = Math.min(minY, o.minY);
        maxY = Math.max(maxY, o.maxY);
    }

    // getters
}

Then you can collect a Stream<Point> into a PointStatistics.

class Program {
    public static void main(String[] args) {
        List<Point> points = new ArrayList<>();

        // populate 'points'

        PointStatistics statistics = points
                    .stream()
                    .collect(PointStatistics::new, PointStatistics::accept, PointStatistics::combine);
    }
}

UPDATE

I was completely baffled by the conclusion drawn by OP, so I decided to write JMH benchmarks.

Benchmark settings:

# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# Warmup: 1 iterations, 10 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
# Benchmark mode: Average time, time/op

For each iteration, I was generating a shared list of random Points (new Point(random.nextInt(), random.nextInt())) of size 100K, 1M, 10M.

The results are

100K

Benchmark                        Mode  Cnt  Score   Error  Units

customCollector                  avgt   10  6.760 ± 0.789  ms/op
forEach                          avgt   10  0.255 ± 0.033  ms/op
fourStreams                      avgt   10  5.115 ± 1.149  ms/op
statistics                       avgt   10  0.887 ± 0.114  ms/op
twoStreams                       avgt   10  2.869 ± 0.567  ms/op

1M

Benchmark                        Mode  Cnt   Score   Error  Units

customCollector                  avgt   10  68.117 ± 4.822  ms/op
forEach                          avgt   10   3.939 ± 0.559  ms/op
fourStreams                      avgt   10  57.800 ± 4.817  ms/op
statistics                       avgt   10   9.904 ± 1.048  ms/op
twoStreams                       avgt   10  32.303 ± 2.498  ms/op

10M

Benchmark                        Mode  Cnt    Score     Error  Units

customCollector                  avgt   10  714.016 ± 151.558  ms/op
forEach                          avgt   10   54.334 ±   9.820  ms/op
fourStreams                      avgt   10  699.599 ± 138.332  ms/op
statistics                       avgt   10  148.649 ±  26.248  ms/op
twoStreams                       avgt   10  429.050 ±  72.879  ms/op

Upvotes: 9

MWB
MWB

Reputation: 1879

Thanks everyone for all the suggestions and answers. This is very helpful and I learned a lot!

I decided to give most of your solutions a go (except for the JDK12 solution). For some of them you already provide me with the code. In addition, I made my own Collector.

class extremesCollector implements Collector<Point, Map<String, Integer>, Map<String , Integer>> {

    @Override
    public Supplier<Map<String, Integer>> supplier() {
        Map<String, Integer> map = new HashMap<>();
        map.put("xMin", Integer.MAX_VALUE);
        map.put("yMin", Integer.MAX_VALUE);
        map.put("xMax", Integer.MIN_VALUE);
        map.put("yMax", Integer.MIN_VALUE);
        return () -> map;
    }

    @Override
    public BiConsumer<Map<String, Integer>, Point> accumulator() {
        return (a, b) -> {
            a.put("xMin", Math.min(a.get("xMin"), b.x));
            a.put("yMin", Math.min(a.get("yMin"), b.y));
            a.put("xMax", Math.max(a.get("xMax"), b.x));
            a.put("yMax", Math.max(a.get("yMax"), b.y));
        };
    }

    @Override
    public Function<Map<String, Integer>, Map<String, Integer>> finisher() {
        return Function.identity();
    }

    @Override
    public BinaryOperator<Map<String, Integer>> combiner() {
        return (a, b) -> {
            a.put("xMin", Math.min(a.get("xMin"), b.get("xMin")));
            a.put("yMin", Math.min(a.get("yMin"), b.get("yMin")));
            a.put("xMax", Math.max(a.get("xMax"), b.get("xMax")));
            a.put("yMax", Math.max(a.get("yMax"), b.get("yMax")));
            return a;
        };
    }

    @Override
    public Set<Characteristics> characteristics() {
        Set<Characteristics> characteristics = new HashSet<>();
        characteristics.add(Characteristics.UNORDERED);
        characteristics.add(Characteristics.CONCURRENT);
        characteristics.add(Characteristics.IDENTITY_FINISH);
        return characteristics;
    }
}

Results

I gave all of them a try and compared the results. Good news: for all of them, I got the same result as far as the values are concerned!

With respect to the speed, here is the ranking:

  1. for-loops
  2. four individual streams
  3. stream with self-made Collector
  4. parallel stream with self-made Collector
  5. statistics approach provided by Andrew Tobilko

Number 2 and 3 are actually very close in terms of speed. The parallel version is probably slower because my dataset is too small.

Upvotes: 2

davidxxx
davidxxx

Reputation: 131346

You could divide by two the iterations with summaryStatistics() while keeping a straight code :

IntSummaryStatistics stat = points.stream().mapToInt(point -> point.x).summaryStatistics();
int minX = stat.getMin();
int maxX = stat.getMax();

And do the same thing with point.y.
You could factor out in this way :

Function<ToIntFunction<Point>, IntSummaryStatistics> statFunction =
        intExtractor -> points.stream()
                              .mapToInt(p -> intExtractor.applyAsInt(pp))
                              .summaryStatistics();

IntSummaryStatistics statX = statFunction.apply(p -> p.x);
IntSummaryStatistics statY = statFunction.apply(p -> p.y);

A custom collector is a possibility but note that you should implement the combiner part that will make your code harder to read.
So but if you need to use parallel stream you should stay to the imperative way.
While you could improve your actual code by relying the Math.min and Math.max functions :

for (Point p : points) {
    minX = Math.min(p.x, minX);
    minY = Math.min(p.y, minY);
    maxY = Math.max(p.x, maxX);
    maxY = Math.max(p.y, maxY);
}

Upvotes: 9

Nikolas
Nikolas

Reputation: 44398

You are able to use 2 Streams using Stream::reduce to get a point with a minimum and a point with a maximum. I don't recommend to concatenate the results to a one stream since there might be hard to distinguish the difference between the minimum, maximum and the coordinates.

Point min = points
    .stream()
    .reduce((l, r) -> new Point(Math.min(l.y, r.y), Math.min(l.y, r.y))
    .orElse(new Point(-1, -1));

Point max = points
    .stream()
    .reduce((l, r) -> new Point(Math.max(l.y, r.y), Math.max(l.y, r.y))
    .orElse(new Point(-1, -1));

As the BinaryOperator<Point> use the two subsequent Points and a ternary operator to find out the minimum/maximum which is passed to a new object Point and returned using Optional::orElse with the default -1, -1 coordinates.

Upvotes: 1

Related Questions