Reputation: 1879
I want to determine the minimum area required to display a collection of points. The easy way is to loop through the collection like this:
int minX = Integer.MAX_VALUE;
int maxX = Integer.MIN_VALUE;
int minY = Integer.MAX_VALUE;
int maxY = Integer.MIN_VALUE;
for (Point point: points) {
if (point.x < minX) {
minX = point.x;
}
if (point.x > maxX) {
maxX = point.x;
}
if (point.y < minY) {
minY = point.y;
}
if (point.y > maxY) {
maxY = point.y;
}
}
I am getting to know streams. To do the same, you can do the following:
int minX = points.stream().mapToInt(point -> point.x).min().orElse(-1);
int maxX = points.stream().mapToInt(point -> point.x).max().orElse(-1);
int minY = points.stream().mapToInt(point -> point.y).min().orElse(-1);
int maxY = points.stream().mapToInt(point -> point.y).max().orElse(-1);
Both give the same result. However, although the streams approach is elegant, it is much slower (as expected).
Is there a way to get minX
, maxX
, minY
and maxY
in a single stream operation?
Upvotes: 10
Views: 3426
Reputation: 34460
JDK 12 and above has Collectors.teeing
(webrev and CSR), which collects to two different collectors and then merges both partial results into a final result.
You could use it here to collect to two IntSummaryStatistics
for both the x
coordinate and the y
coordinate:
List<IntSummaryStatistics> stats = points.stream()
.collect(Collectors.teeing(
Collectors.mapping(p -> p.x, Collectors.summarizingInt()),
Collectors.mapping(p -> p.y, Collectors.summarizingInt()),
List::of));
int minX = stats.get(0).getMin();
int maxX = stats.get(0).getMax();
int minY = stats.get(1).getMin();
int maxY = stats.get(1).getMax();
Here the first collector gathers statistics for x
and the second one for y
. Then, statistics for both x
and y
are merged into a List
by means of the JDK 9 List.of
factory method that accepts two elements.
An aternative to List::of
for the merger would be:
(xStats, yStats) -> Arrays.asList(xStats, yStats)
If you happen to not have JDK 12 installed on your machine, here's a simplified, generic version of the teeing
method that you can safely use as a utility method:
public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
Collector<? super T, A1, R1> downstream1,
Collector<? super T, A2, R2> downstream2,
BiFunction<? super R1, ? super R2, R> merger) {
class Acc {
A1 acc1 = downstream1.supplier().get();
A2 acc2 = downstream2.supplier().get();
void accumulate(T t) {
downstream1.accumulator().accept(acc1, t);
downstream2.accumulator().accept(acc2, t);
}
Acc combine(Acc other) {
acc1 = downstream1.combiner().apply(acc1, other.acc1);
acc2 = downstream2.combiner().apply(acc2, other.acc2);
return this;
}
R applyMerger() {
R1 r1 = downstream1.finisher().apply(acc1);
R2 r2 = downstream2.finisher().apply(acc2);
return merger.apply(r1, r2);
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
}
Please note that I'm not considering the characteristics of the downstream collectors when creating the returned collector.
Upvotes: 12
Reputation: 49606
By analogy with IntSummaryStatistics
, create a class PointStatistics
which collects the information you need. It defines two methods: one for recording values from a Point
, one for combining two Statistics
.
class PointStatistics {
private int minX = Integer.MAX_VALUE;
private int maxX = Integer.MIN_VALUE;
private int minY = Integer.MAX_VALUE;
private int maxY = Integer.MIN_VALUE;
public void accept(Point p) {
minX = Math.min(minX, p.x);
maxX = Math.max(maxX, p.x);
minY = Math.min(minY, p.y);
maxY = Math.max(minY, p.y);
}
public void combine(PointStatistics o) {
minX = Math.min(minX, o.minX);
maxX = Math.max(maxX, o.maxX);
minY = Math.min(minY, o.minY);
maxY = Math.max(maxY, o.maxY);
}
// getters
}
Then you can collect a Stream<Point>
into a PointStatistics
.
class Program {
public static void main(String[] args) {
List<Point> points = new ArrayList<>();
// populate 'points'
PointStatistics statistics = points
.stream()
.collect(PointStatistics::new, PointStatistics::accept, PointStatistics::combine);
}
}
UPDATE
I was completely baffled by the conclusion drawn by OP, so I decided to write JMH benchmarks.
Benchmark settings:
# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# Warmup: 1 iterations, 10 s each
# Measurement: 10 iterations, 10 s each
# Timeout: 10 min per iteration
# Benchmark mode: Average time, time/op
For each iteration, I was generating a shared list of random Point
s (new Point(random.nextInt(), random.nextInt())
) of size 100K, 1M, 10M.
The results are
100K
Benchmark Mode Cnt Score Error Units
customCollector avgt 10 6.760 ± 0.789 ms/op
forEach avgt 10 0.255 ± 0.033 ms/op
fourStreams avgt 10 5.115 ± 1.149 ms/op
statistics avgt 10 0.887 ± 0.114 ms/op
twoStreams avgt 10 2.869 ± 0.567 ms/op
1M
Benchmark Mode Cnt Score Error Units
customCollector avgt 10 68.117 ± 4.822 ms/op
forEach avgt 10 3.939 ± 0.559 ms/op
fourStreams avgt 10 57.800 ± 4.817 ms/op
statistics avgt 10 9.904 ± 1.048 ms/op
twoStreams avgt 10 32.303 ± 2.498 ms/op
10M
Benchmark Mode Cnt Score Error Units
customCollector avgt 10 714.016 ± 151.558 ms/op
forEach avgt 10 54.334 ± 9.820 ms/op
fourStreams avgt 10 699.599 ± 138.332 ms/op
statistics avgt 10 148.649 ± 26.248 ms/op
twoStreams avgt 10 429.050 ± 72.879 ms/op
Upvotes: 9
Reputation: 1879
Thanks everyone for all the suggestions and answers. This is very helpful and I learned a lot!
I decided to give most of your solutions a go (except for the JDK12 solution). For some of them you already provide me with the code. In addition, I made my own Collector
.
class extremesCollector implements Collector<Point, Map<String, Integer>, Map<String , Integer>> {
@Override
public Supplier<Map<String, Integer>> supplier() {
Map<String, Integer> map = new HashMap<>();
map.put("xMin", Integer.MAX_VALUE);
map.put("yMin", Integer.MAX_VALUE);
map.put("xMax", Integer.MIN_VALUE);
map.put("yMax", Integer.MIN_VALUE);
return () -> map;
}
@Override
public BiConsumer<Map<String, Integer>, Point> accumulator() {
return (a, b) -> {
a.put("xMin", Math.min(a.get("xMin"), b.x));
a.put("yMin", Math.min(a.get("yMin"), b.y));
a.put("xMax", Math.max(a.get("xMax"), b.x));
a.put("yMax", Math.max(a.get("yMax"), b.y));
};
}
@Override
public Function<Map<String, Integer>, Map<String, Integer>> finisher() {
return Function.identity();
}
@Override
public BinaryOperator<Map<String, Integer>> combiner() {
return (a, b) -> {
a.put("xMin", Math.min(a.get("xMin"), b.get("xMin")));
a.put("yMin", Math.min(a.get("yMin"), b.get("yMin")));
a.put("xMax", Math.max(a.get("xMax"), b.get("xMax")));
a.put("yMax", Math.max(a.get("yMax"), b.get("yMax")));
return a;
};
}
@Override
public Set<Characteristics> characteristics() {
Set<Characteristics> characteristics = new HashSet<>();
characteristics.add(Characteristics.UNORDERED);
characteristics.add(Characteristics.CONCURRENT);
characteristics.add(Characteristics.IDENTITY_FINISH);
return characteristics;
}
}
Results
I gave all of them a try and compared the results. Good news: for all of them, I got the same result as far as the values are concerned!
With respect to the speed, here is the ranking:
Number 2 and 3 are actually very close in terms of speed. The parallel version is probably slower because my dataset is too small.
Upvotes: 2
Reputation: 131346
You could divide by two the iterations with summaryStatistics()
while keeping a straight code :
IntSummaryStatistics stat = points.stream().mapToInt(point -> point.x).summaryStatistics();
int minX = stat.getMin();
int maxX = stat.getMax();
And do the same thing with point.y
.
You could factor out in this way :
Function<ToIntFunction<Point>, IntSummaryStatistics> statFunction =
intExtractor -> points.stream()
.mapToInt(p -> intExtractor.applyAsInt(pp))
.summaryStatistics();
IntSummaryStatistics statX = statFunction.apply(p -> p.x);
IntSummaryStatistics statY = statFunction.apply(p -> p.y);
A custom collector is a possibility but note that you should implement the combiner part that will make your code harder to read.
So but if you need to use parallel stream you should stay to the imperative way.
While you could improve your actual code by relying the Math.min
and Math.max
functions :
for (Point p : points) {
minX = Math.min(p.x, minX);
minY = Math.min(p.y, minY);
maxY = Math.max(p.x, maxX);
maxY = Math.max(p.y, maxY);
}
Upvotes: 9
Reputation: 44398
You are able to use 2 Streams using Stream::reduce
to get a point with a minimum and a point with a maximum. I don't recommend to concatenate the results to a one stream since there might be hard to distinguish the difference between the minimum, maximum and the coordinates.
Point min = points
.stream()
.reduce((l, r) -> new Point(Math.min(l.y, r.y), Math.min(l.y, r.y))
.orElse(new Point(-1, -1));
Point max = points
.stream()
.reduce((l, r) -> new Point(Math.max(l.y, r.y), Math.max(l.y, r.y))
.orElse(new Point(-1, -1));
As the BinaryOperator<Point>
use the two subsequent Points
and a ternary operator to find out the minimum/maximum which is passed to a new object Point
and returned using Optional::orElse
with the default -1, -1
coordinates.
Upvotes: 1